Luis R. Rodriguez [Fri, 14 Jul 2017 21:50:11 +0000 (14:50 -0700)]
kmod: throttle kmod thread limit
If we reach the limit of modprobe_limit threads running the next
request_module() call will fail. The original reason for adding a kill
was to do away with possible issues with in old circumstances which would
create a recursive series of request_module() calls.
We can do better than just be super aggressive and reject calls once we've
reached the limit by simply making pending callers wait until the
threshold has been reduced, and then throttling them in, one by one.
This throttling enables requests over the kmod concurrent limit to be
processed once a pending request completes. Only the first item queued up
to wait is woken up. The assumption here is once a task is woken it will
have no other option to also kick the queue to check if there are more
pending tasks -- regardless of whether or not it was successful.
By throttling and processing only max kmod concurrent tasks we ensure we
avoid unexpected fatal request_module() calls, and we keep memory
consumption on module loading to a minimum.
With x86_64 qemu, with 4 cores, 4 GiB of RAM it takes the following run
time to run both tests:
time ./kmod.sh -t 0008
real 0m16.366s
user 0m0.883s
sys 0m8.916s
time ./kmod.sh -t 0009
real 0m50.803s
user 0m0.791s
sys 0m9.852s
Link: http://lkml.kernel.org/r/20170628223155.26472-4-mcgrof@kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Jessica Yu <jeyu@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Michal Marek <mmarek@suse.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Luis R. Rodriguez [Fri, 14 Jul 2017 21:50:08 +0000 (14:50 -0700)]
kmod: add test driver to stress test the module loader
This adds a new stress test driver for kmod: the kernel module loader.
The new stress test driver, test_kmod, is only enabled as a module right
now. It should be possible to load this as built-in and load tests
early (refer to the force_init_test module parameter), however since a
lot of test can get a system out of memory fast we leave this disabled
for now.
Using a system with 1024 MiB of RAM can *easily* get your kernel OOM
fast with this test driver.
The test_kmod driver exposes API knobs for us to fine tune simple
request_module() and get_fs_type() calls. Since these API calls only
allow each one parameter a test driver for these is rather simple.
Other factors that can help out test driver though are the number of
calls we issue and knowing current limitations of each. This exposes
configuration as much as possible through userspace to be able to build
tests directly from userspace.
Since it allows multiple misc devices its will eventually (once we add a
knob to let us create new devices at will) also be possible to perform
more tests in parallel, provided you have enough memory.
We only enable tests we know work as of right now.
Demo screenshots:
# tools/testing/selftests/kmod/kmod.sh
kmod_test_0001_driver: OK! - loading kmod test
kmod_test_0001_driver: OK! - Return value: 256 (MODULE_NOT_FOUND), expected MODULE_NOT_FOUND
kmod_test_0001_fs: OK! - loading kmod test
kmod_test_0001_fs: OK! - Return value: -22 (-EINVAL), expected -EINVAL
kmod_test_0002_driver: OK! - loading kmod test
kmod_test_0002_driver: OK! - Return value: 256 (MODULE_NOT_FOUND), expected MODULE_NOT_FOUND
kmod_test_0002_fs: OK! - loading kmod test
kmod_test_0002_fs: OK! - Return value: -22 (-EINVAL), expected -EINVAL
kmod_test_0003: OK! - loading kmod test
kmod_test_0003: OK! - Return value: 0 (SUCCESS), expected SUCCESS
kmod_test_0004: OK! - loading kmod test
kmod_test_0004: OK! - Return value: 0 (SUCCESS), expected SUCCESS
kmod_test_0005: OK! - loading kmod test
kmod_test_0005: OK! - Return value: 0 (SUCCESS), expected SUCCESS
kmod_test_0006: OK! - loading kmod test
kmod_test_0006: OK! - Return value: 0 (SUCCESS), expected SUCCESS
kmod_test_0005: OK! - loading kmod test
kmod_test_0005: OK! - Return value: 0 (SUCCESS), expected SUCCESS
kmod_test_0006: OK! - loading kmod test
kmod_test_0006: OK! - Return value: 0 (SUCCESS), expected SUCCESS
XXX: add test restult for 0007
Test completed
You can also request for specific tests:
# tools/testing/selftests/kmod/kmod.sh -t 0001
kmod_test_0001_driver: OK! - loading kmod test
kmod_test_0001_driver: OK! - Return value: 256 (MODULE_NOT_FOUND), expected MODULE_NOT_FOUND
kmod_test_0001_fs: OK! - loading kmod test
kmod_test_0001_fs: OK! - Return value: -22 (-EINVAL), expected -EINVAL
Test completed
Lastly, the current available number of tests:
# tools/testing/selftests/kmod/kmod.sh --help
Usage: tools/testing/selftests/kmod/kmod.sh [ -t <4-number-digit> ]
Valid tests: 0001-0009
0001 - Simple test - 1 thread for empty string
0002 - Simple test - 1 thread for modules/filesystems that do not exist
0003 - Simple test - 1 thread for get_fs_type() only
0004 - Simple test - 2 threads for get_fs_type() only
0005 - multithreaded tests with default setup - request_module() only
0006 - multithreaded tests with default setup - get_fs_type() only
0007 - multithreaded tests with default setup test request_module() and get_fs_type()
0008 - multithreaded - push kmod_concurrent over max_modprobes for request_module()
0009 - multithreaded - push kmod_concurrent over max_modprobes for get_fs_type()
The following test cases currently fail, as such they are not currently
enabled by default:
# tools/testing/selftests/kmod/kmod.sh -t 0008
# tools/testing/selftests/kmod/kmod.sh -t 0009
To be sure to run them as intended please unload both of the modules:
o test_module
o xfs
And ensure they are not loaded on your system prior to testing them. If
you use these paritions for your rootfs you can change the default test
driver used for get_fs_type() by exporting it into your environment. For
example of other test defaults you can override refer to kmod.sh
allow_user_defaults().
Behind the scenes this is how we fine tune at a test case prior to
hitting a trigger to run it:
cat /sys/devices/virtual/misc/test_kmod0/config
echo -n "2" > /sys/devices/virtual/misc/test_kmod0/config_test_case
echo -n "ext4" > /sys/devices/virtual/misc/test_kmod0/config_test_fs
echo -n "80" > /sys/devices/virtual/misc/test_kmod0/config_num_threads
cat /sys/devices/virtual/misc/test_kmod0/config
echo -n "1" > /sys/devices/virtual/misc/test_kmod0/config_num_threads
Finally to trigger:
echo -n "1" > /sys/devices/virtual/misc/test_kmod0/trigger_config
The kmod.sh script uses the above constructs to build different test cases.
A bit of interpretation of the current failures follows, first two
premises:
a) When request_module() is used userspace figures out an optimized
version of module order for us. Once it finds the modules it needs, as
per depmod symbol dep map, it will finit_module() the respective
modules which are needed for the original request_module() request.
b) We have an optimization in place whereby if a kernel uses
request_module() on a module already loaded we never bother userspace
as the module already is loaded. This is all handled by kernel/kmod.c.
A few things to consider to help identify root causes of issues:
0) kmod 19 has a broken heuristic for modules being assumed to be
built-in to your kernel and will return 0 even though request_module()
failed. Upgrade to a newer version of kmod.
1) A get_fs_type() call for "xfs" will request_module() for "fs-xfs",
not for "xfs". The optimization in kernel described in b) fails to
catch if we have a lot of consecutive get_fs_type() calls. The reason
is the optimization in place does not look for aliases. This means two
consecutive get_fs_type() calls will bump kmod_concurrent, whereas
request_module() will not.
This one explanation why test case 0009 fails at least once for
get_fs_type().
2) If a module fails to load --- for whatever reason (kmod_concurrent
limit reached, file not yet present due to rootfs switch, out of
memory) we have a period of time during which module request for the
same name either with request_module() or get_fs_type() will *also*
fail to load even if the file for the module is ready.
This explains why *multiple* NULLs are possible on test 0009.
3) finit_module() consumes quite a bit of memory.
4) Filesystems typically also have more dependent modules than other
modules, its important to note though that even though a get_fs_type()
call does not incur additional kmod_concurrent bumps, since userspace
loads dependencies it finds it needs via finit_module_fd(), it *will*
take much more memory to load a module with a lot of dependencies.
Because of 3) and 4) we will easily run into out of memory failures with
certain tests. For instance test 0006 fails on qemu with 1024 MiB of RAM.
It panics a box after reaping all userspace processes and still not
having enough memory to reap.
[arnd@arndb.de: add dependencies for test module]
Link: http://lkml.kernel.org/r/20170630154834.3689272-1-arnd@arndb.de
Link: http://lkml.kernel.org/r/20170628223155.26472-3-mcgrof@kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
Cc: Jessica Yu <jeyu@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Michal Marek <mmarek@suse.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Luis R. Rodriguez [Fri, 14 Jul 2017 21:50:05 +0000 (14:50 -0700)]
MAINTAINERS: give kmod some maintainer love
As suggested by Jessica, I've been actively working on kmod, so might as
well reflect its maintained status.
Changes are expected to go through akpm's tree.
Link: http://lkml.kernel.org/r/20170628223155.26472-2-mcgrof@kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
Suggested-by: Jessica Yu <jeyu@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Michal Marek <mmarek@suse.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tobias Klauser [Fri, 14 Jul 2017 21:50:03 +0000 (14:50 -0700)]
xtensa: use generic fb.h
The arch uses a verbatim copy of the asm-generic version and does not
add any own implementations to the header, so use asm-generic/fb.h
instead of duplicating code.
Link: http://lkml.kernel.org/r/20170517083545.2115-1-tklauser@distanz.ch
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Acked-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Akinobu Mita [Fri, 14 Jul 2017 21:50:00 +0000 (14:50 -0700)]
fault-inject: add /proc/<pid>/fail-nth
fail-nth interface is only created in /proc/self/task/<current-tid>/.
This change also adds it in /proc/<pid>/.
This makes shell based tool a bit simpler.
$ bash -c "builtin echo 100 > /proc/self/fail-nth && exec ls /"
Link: http://lkml.kernel.org/r/1491490561-10485-6-git-send-email-akinobu.mita@gmail.com
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Akinobu Mita [Fri, 14 Jul 2017 21:49:57 +0000 (14:49 -0700)]
fault-inject: simplify access check for fail-nth
The fail-nth file is created with 0666 and the access is permitted if
and only if the task is current.
This file is owned by the currnet user. So we can create it with 0644
and allow the owner to write it. This enables to watch the status of
task->fail_nth from another processes.
[akinobu.mita@gmail.com: don't convert unsigned type value as signed int]
Link: http://lkml.kernel.org/r/1492444483-9239-1-git-send-email-akinobu.mita@gmail.com
[akinobu.mita@gmail.com: avoid unwanted data race to task->fail_nth]
Link: http://lkml.kernel.org/r/1499962492-8931-1-git-send-email-akinobu.mita@gmail.com
Link: http://lkml.kernel.org/r/1491490561-10485-5-git-send-email-akinobu.mita@gmail.com
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Acked-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Akinobu Mita [Fri, 14 Jul 2017 21:49:54 +0000 (14:49 -0700)]
fault-inject: make fail-nth read/write interface symmetric
The read interface for fail-nth looks a bit odd. Read from this file
returns "NYYYY..." or "YYYYY..." (this makes me surprise when cat this
file). Because there is no EOF condition. The first character
indicates current->fail_nth is zero or not, and then current->fail_nth
is reset to zero.
Just returning task->fail_nth value is more natural to understand.
Link: http://lkml.kernel.org/r/1491490561-10485-4-git-send-email-akinobu.mita@gmail.com
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Akinobu Mita [Fri, 14 Jul 2017 21:49:52 +0000 (14:49 -0700)]
fault-inject: parse as natural 1-based value for fail-nth write interface
The value written to fail-nth file is parsed as 0-based. Parsing as
one-based is more natural to understand and it enables to cancel the
previous setup by simply writing '0'.
This change also converts task->fail_nth from signed to unsigned int.
Link: http://lkml.kernel.org/r/1491490561-10485-3-git-send-email-akinobu.mita@gmail.com
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Akinobu Mita [Fri, 14 Jul 2017 21:49:49 +0000 (14:49 -0700)]
fault-inject: automatically detect the number base for fail-nth write interface
Automatically detect the number base to use when writing to fail-nth
file instead of always parsing as a decimal number.
Link: http://lkml.kernel.org/r/1491490561-10485-2-git-send-email-akinobu.mita@gmail.com
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kefeng Wang [Fri, 14 Jul 2017 21:49:46 +0000 (14:49 -0700)]
kernel/watchdog.c: use better pr_fmt prefix
After commit
73ce0511c436 ("kernel/watchdog.c: move hardlockup
detector to separate file"), 'NMI watchdog' is inappropriate in
kernel/watchdog.c, using 'watchdog' only.
Link: http://lkml.kernel.org/r/1499928642-48983-1-git-send-email-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Babu Moger <babu.moger@oracle.com>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Luis de Bethencourt [Fri, 14 Jul 2017 21:49:44 +0000 (14:49 -0700)]
MAINTAINERS: move the befs tree to kernel.org
Update the location of the befs git tree and my email address.
Link: http://lkml.kernel.org/r/20170709110012.2991-1-luisbg@kernel.org
Signed-off-by: Luis de Bethencourt <luisbg@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michael Ellerman [Fri, 14 Jul 2017 21:49:41 +0000 (14:49 -0700)]
lib/atomic64_test.c: add a test that atomic64_inc_not_zero() returns an int
atomic64_inc_not_zero() returns a "truth value" which in C is
traditionally an int. That means callers are likely to expect the
result will fit in an int.
If an implementation returns a "true" value which does not fit in an
int, then there's a possibility that callers will truncate it when they
store it in an int.
In fact this happened in practice, see commit
966d2b04e070
("percpu-refcount: fix reference leak during percpu-atomic transition").
So add a test that the result fits in an int, even when the input
doesn't. This catches the case where an implementation just passes the
non-zero input value out as the result.
Link: http://lkml.kernel.org/r/1499775133-1231-1-git-send-email-mpe@ellerman.id.au
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: Douglas Miller <dougmill@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Helge Deller [Fri, 14 Jul 2017 21:49:38 +0000 (14:49 -0700)]
mm: fix overflow check in expand_upwards()
Jörn Engel noticed that the expand_upwards() function might not return
-ENOMEM in case the requested address is (unsigned long)-PAGE_SIZE and
if the architecture didn't defined TASK_SIZE as multiple of PAGE_SIZE.
Affected architectures are arm, frv, m68k, blackfin, h8300 and xtensa
which all define TASK_SIZE as 0xffffffff, but since none of those have
an upwards-growing stack we currently have no actual issue.
Nevertheless let's fix this just in case any of the architectures with
an upward-growing stack (currently parisc, metag and partly ia64) define
TASK_SIZE similar.
Link: http://lkml.kernel.org/r/20170702192452.GA11868@p100.box
Fixes:
bd726c90b6b8 ("Allow stack to grow up to address space limit")
Signed-off-by: Helge Deller <deller@gmx.de>
Reported-by: Jörn Engel <joern@purestorage.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Thu, 13 Jul 2017 21:35:37 +0000 (14:35 -0700)]
Merge tag 'nfs-for-4.13-1' of git://git.linux-nfs.org/projects/anna/linux-nfs
Pull NFS client updates from Anna Schumaker:
"Stable bugfixes:
- Fix -EACCESS on commit to DS handling
- Fix initialization of nfs_page_array->npages
- Only invalidate dentries that are actually invalid
Features:
- Enable NFSoRDMA transparent state migration
- Add support for lookup-by-filehandle
- Add support for nfs re-exporting
Other bugfixes and cleanups:
- Christoph cleaned up the way we declare NFS operations
- Clean up various internal structures
- Various cleanups to commits
- Various improvements to error handling
- Set the dt_type of . and .. entries in NFS v4
- Make slot allocation more reliable
- Fix fscache stat printing
- Fix uninitialized variable warnings
- Fix potential list overrun in nfs_atomic_open()
- Fix a race in NFSoRDMA RPC reply handler
- Fix return size for nfs42_proc_copy()
- Fix against MAC forgery timing attacks"
* tag 'nfs-for-4.13-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (68 commits)
NFS: Don't run wake_up_bit() when nobody is waiting...
nfs: add export operations
nfs4: add NFSv4 LOOKUPP handlers
nfs: add a nfs_ilookup helper
nfs: replace d_add with d_splice_alias in atomic_open
sunrpc: use constant time memory comparison for mac
NFSv4.2 fix size storage for nfs42_proc_copy
xprtrdma: Fix documenting comments in frwr_ops.c
xprtrdma: Replace PAGE_MASK with offset_in_page()
xprtrdma: FMR does not need list_del_init()
xprtrdma: Demote "connect" log messages
NFSv4.1: Use seqid returned by EXCHANGE_ID after state migration
NFSv4.1: Handle EXCHGID4_FLAG_CONFIRMED_R during NFSv4.1 migration
xprtrdma: Don't defer MR recovery if ro_map fails
xprtrdma: Fix FRWR invalidation error recovery
xprtrdma: Fix client lock-up after application signal fires
xprtrdma: Rename rpcrdma_req::rl_free
xprtrdma: Pass only the list of registered MRs to ro_unmap_sync
xprtrdma: Pre-mark remotely invalidated MRs
xprtrdma: On invalidation failure, remove MWs from rl_registered
...
Linus Torvalds [Thu, 13 Jul 2017 21:27:32 +0000 (14:27 -0700)]
Merge branch 'for-next' of git://git./linux/kernel/git/nab/target-pending
Pull SCSI target updates from Nicholas Bellinger:
"It's been usually busy for summer, with most of the efforts centered
around TCMU developments and various target-core + fabric driver bug
fixing activities. Not particularly large in terms of LoC, but lots of
smaller patches from many different folks.
The highlights include:
- ibmvscsis logical partition manager support (Michael Cyr + Bryant
Ly)
- Convert target/iblock WRITE_SAME to blkdev_issue_zeroout (hch +
nab)
- Add support for TMR percpu LUN reference counting (nab)
- Fix a potential deadlock between EXTENDED_COPY and iscsi shutdown
(Bart)
- Fix COMPARE_AND_WRITE caw_sem leak during se_cmd quiesce (Jiang Yi)
- Fix TMCU module removal (Xiubo Li)
- Fix iser-target OOPs during login failure (Andrea Righi + Sagi)
- Breakup target-core free_device backend driver callback (mnc)
- Perform TCMU add/delete/reconfig synchronously (mnc)
- Fix TCMU multiple UIO open/close sequences (mnc)
- Fix TCMU CHECK_CONDITION sense handling (mnc)
- Fix target-core SAM_STAT_BUSY + TASK_SET_FULL handling (mnc + nab)
- Introduce TYPE_ZBC support in PSCSI (Damien Le Moal)
- Fix possible TCMU memory leak + OOPs when recalculating cmd base
size (Xiubo Li + Bryant Ly + Damien Le Moal + mnc)
- Add login_keys_workaround attribute for non RFC initiators (Robert
LeBlanc + Arun Easi + nab)"
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (68 commits)
iscsi-target: Add login_keys_workaround attribute for non RFC initiators
Revert "qla2xxx: Fix incorrect tcm_qla2xxx_free_cmd use during TMR ABORT"
tcmu: clean up the code and with one small fix
tcmu: Fix possbile memory leak / OOPs when recalculating cmd base size
target: export lio pgr/alua support as device attr
target: Fix return sense reason in target_scsi3_emulate_pr_out
target: Fix cmd size for PR-OUT in passthrough_parse_cdb
tcmu: Fix dev_config_store
target: pscsi: Introduce TYPE_ZBC support
target: Use macro for WRITE_VERIFY_32 operation codes
target: fix SAM_STAT_BUSY/TASK_SET_FULL handling
target: remove transport_complete
pscsi: finish cmd processing from pscsi_req_done
tcmu: fix sense handling during completion
target: add helper to copy sense to se_cmd buffer
target: do not require a transport_complete for SCF_TRANSPORT_TASK_SENSE
target: make device_mutex and device_list static
tcmu: Fix flushing cmd entry dcache page
tcmu: fix multiple uio open/close sequences
tcmu: drop configured check in destroy
...
Trond Myklebust [Tue, 11 Jul 2017 21:53:48 +0000 (17:53 -0400)]
NFS: Don't run wake_up_bit() when nobody is waiting...
"perf lock" shows fairly heavy contention for the bit waitqueue locks
when doing an I/O heavy workload.
Use a bit to tell whether or not there has been contention for a lock
so that we can optimise away the bit waitqueue options in those cases.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Peng Tao [Thu, 29 Jun 2017 13:34:53 +0000 (06:34 -0700)]
nfs: add export operations
This support for opening files on NFS by file handle, both through the
open_by_handle syscall, and for re-exporting NFS (for example using a
different version). The support is very basic for now, as each open by
handle will have to do an NFSv4 open operation on the wire. In the
future this will hopefully be mitigated by an open file cache, as well
as various optimizations in NFS for this specific case.
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
[hch: incorporated various changes, resplit the patches, new changelog]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Linus Torvalds [Thu, 13 Jul 2017 20:56:24 +0000 (13:56 -0700)]
Merge tag 'nfsd-4.13' of git://linux-nfs.org/~bfields/linux
Pull nfsd updates from Bruce Fields:
"Chuck's RDMA update overhauls the "call receive" side of the
RPC-over-RDMA transport to use the new rdma_rw API.
Christoph cleaned the way nfs operations are declared, removing a
bunch of function-pointer casts and declaring the operation vectors as
const.
Christoph's changes touch both client and server, and both client and
server pulls this time around should be based on the same commits from
Christoph"
* tag 'nfsd-4.13' of git://linux-nfs.org/~bfields/linux: (53 commits)
svcrdma: fix an incorrect check on -E2BIG and -EINVAL
nfsd4: factor ctime into change attribute
svcrdma: Remove svc_rdma_chunk_ctxt::cc_dir field
svcrdma: use offset_in_page() macro
svcrdma: Clean up after converting svc_rdma_recvfrom to rdma_rw API
svcrdma: Clean-up svc_rdma_unmap_dma
svcrdma: Remove frmr cache
svcrdma: Remove unused Read completion handlers
svcrdma: Properly compute .len and .buflen for received RPC Calls
svcrdma: Use generic RDMA R/W API in RPC Call path
svcrdma: Add recvfrom helpers to svc_rdma_rw.c
sunrpc: Allocate up to RPCSVC_MAXPAGES per svc_rqst
svcrdma: Don't account for Receive queue "starvation"
svcrdma: Improve Reply chunk sanity checking
svcrdma: Improve Write chunk sanity checking
svcrdma: Improve Read chunk sanity checking
svcrdma: Remove svc_rdma_marshal.c
svcrdma: Avoid Send Queue overflow
svcrdma: Squelch disconnection messages
sunrpc: Disable splice for krb5i
...
Linus Torvalds [Thu, 13 Jul 2017 20:53:54 +0000 (13:53 -0700)]
Merge branch 'for_linus' of git://git./linux/kernel/git/jack/linux-fs
Pull ext2, udf, reiserfs fixes from Jan Kara:
"Several ext2, udf, and reiserfs fixes"
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
ext2: Fix memory leak when truncate races ext2_get_blocks
reiserfs: fix race in prealloc discard
reiserfs: don't preallocate blocks for extended attributes
udf: Convert udf_disk_stamp_to_time() to use mktime64()
udf: Use time64_to_tm for timestamp conversion
udf: Fix deadlock between writeback and udf_setsize()
udf: Use i_size_read() in udf_adinicb_writepage()
udf: Fix races with i_size changes during readpage
udf: Remove unused UDF_DEFAULT_BLOCKSIZE
Linus Torvalds [Thu, 13 Jul 2017 20:44:54 +0000 (13:44 -0700)]
Merge tag '4.13-fixes' of git://git.lwn.net/linux
Pull documentation fixes from Jonathan Corbet:
"A set of fixes for various warnings, including the one caused by the
removal of kernel/rcu/srcu.c. Also correct a stray pointer in
memory-barriers.txt"
* tag '4.13-fixes' of git://git.lwn.net/linux:
kokr/memory-barriers.txt: Fix obsolete link to atomic_ops.txt
memory-barriers.txt: Fix broken link to atomic_ops.txt
docs: Turn off section numbering for the input docs
docs: Include uaccess docs from the right file
docs: Do not include from kernel/rcu/srcu.c
Linus Torvalds [Thu, 13 Jul 2017 20:37:57 +0000 (13:37 -0700)]
Merge tag 'kbuild-v4.13-2' of git://git./linux/kernel/git/masahiroy/linux-kbuild
Pull more Kbuild updates from Masahiro Yamada:
- Move generic-y of exported headers to uapi/asm/Kbuild for complete
de-coupling of UAPI
- Clean up scripts/Makefile.headersinst
- Fix host programs for 32 bit machine with XFS file system
* tag 'kbuild-v4.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (29 commits)
kbuild: Enable Large File Support for hostprogs
kbuild: remove wrapper files handling from Makefile.headersinst
kbuild: split exported generic header creation into uapi-asm-generic
kbuild: do not include old-kbuild-file from Makefile.headersinst
xtensa: move generic-y of exported headers to uapi/asm/Kbuild
unicore32: move generic-y of exported headers to uapi/asm/Kbuild
tile: move generic-y of exported headers to uapi/asm/Kbuild
sparc: move generic-y of exported headers to uapi/asm/Kbuild
sh: move generic-y of exported headers to uapi/asm/Kbuild
parisc: move generic-y of exported headers to uapi/asm/Kbuild
openrisc: move generic-y of exported headers to uapi/asm/Kbuild
nios2: move generic-y of exported headers to uapi/asm/Kbuild
nios2: remove unneeded arch/nios2/include/(generated/)asm/signal.h
microblaze: move generic-y of exported headers to uapi/asm/Kbuild
metag: move generic-y of exported headers to uapi/asm/Kbuild
m68k: move generic-y of exported headers to uapi/asm/Kbuild
m32r: move generic-y of exported headers to uapi/asm/Kbuild
ia64: remove redundant generic-y += kvm_para.h from asm/Kbuild
hexagon: move generic-y of exported headers to uapi/asm/Kbuild
h8300: move generic-y of exported headers to uapi/asm/Kbuild
...
Linus Torvalds [Thu, 13 Jul 2017 20:17:19 +0000 (13:17 -0700)]
Merge tag 'trace-v4.13-2' of git://git./linux/kernel/git/rostedt/linux-trace
Pull more tracing updates from Steven Rostedt:
"A few more minor updates:
- Show the tgid mappings for user space trace tools to use
- Fix and optimize the comm and tgid cache recording
- Sanitize derived kprobe names
- Ftrace selftest updates
- trace file header fix
- Update of Documentation/trace/ftrace.txt
- Compiler warning fixes
- Fix possible uninitialized variable"
* tag 'trace-v4.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
ftrace: Fix uninitialized variable in match_records()
ftrace: Remove an unneeded NULL check
ftrace: Hide cached module code for !CONFIG_MODULES
tracing: Do note expose stack_trace_filter without DYNAMIC_FTRACE
tracing: Update Documentation/trace/ftrace.txt
tracing: Fixup trace file header alignment
selftests/ftrace: Add a testcase for kprobe event naming
selftests/ftrace: Add a test to probe module functions
selftests/ftrace: Update multiple kprobes test for powerpc
trace/kprobes: Sanitize derived event names
tracing: Attempt to record other information even if some fail
tracing: Treat recording tgid for idle task as a success
tracing: Treat recording comm for idle task as a success
tracing: Add saved_tgids file to show cached pid to tgid mappings
Jeff Layton [Thu, 29 Jun 2017 13:34:52 +0000 (06:34 -0700)]
nfs4: add NFSv4 LOOKUPP handlers
This will be needed in order to implement the get_parent export op
for nfsd.
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Peng Tao [Thu, 29 Jun 2017 13:34:51 +0000 (06:34 -0700)]
nfs: add a nfs_ilookup helper
This helper will allow to find an existing NFS inode by the file handle
and fattr.
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
[hch: split from a larger patch]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Peng Tao [Thu, 29 Jun 2017 13:34:50 +0000 (06:34 -0700)]
nfs: replace d_add with d_splice_alias in atomic_open
It's a trival change but follows knfsd export document that asks
for d_splice_alias during lookup.
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Jason A. Donenfeld [Sat, 10 Jun 2017 02:59:07 +0000 (04:59 +0200)]
sunrpc: use constant time memory comparison for mac
Otherwise, we enable a MAC forgery via timing attack.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Jeff Layton <jlayton@poochiereds.net>
Cc: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: Anna Schumaker <anna.schumaker@netapp.com>
Cc: linux-nfs@vger.kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Olga Kornievskaia [Thu, 6 Jul 2017 13:43:02 +0000 (09:43 -0400)]
NFSv4.2 fix size storage for nfs42_proc_copy
Return size of COPY is u64 but it was assigned to an "int" status.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Thu, 8 Jun 2017 15:53:24 +0000 (11:53 -0400)]
xprtrdma: Fix documenting comments in frwr_ops.c
Clean up.
FASTREG and LOCAL_INV WRs are typically not signaled. localinv_wake
is used for the last LOCAL_INV WR in a chain, which is always
signaled. The documenting comments should reflect that.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Thu, 8 Jun 2017 15:53:16 +0000 (11:53 -0400)]
xprtrdma: Replace PAGE_MASK with offset_in_page()
Clean up.
Reported by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Thu, 8 Jun 2017 15:53:08 +0000 (11:53 -0400)]
xprtrdma: FMR does not need list_del_init()
Clean up.
Commit
38f1932e60ba ("xprtrdma: Remove FMRs from the unmap list
after unmapping") utilized list_del_init() to try to prevent some
list corruption. The corruption was actually caused by the reply
handler racing with a signal. Now that MR invalidation is properly
serialized, list_del_init() can safely be replaced.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Thu, 8 Jun 2017 15:53:00 +0000 (11:53 -0400)]
xprtrdma: Demote "connect" log messages
Some have complained about the log messages generated when xprtrdma
opens or closes a connection to a server. When an NFS mount is
mostly idle these can appear every few minutes as the client idles
out the connection and reconnects.
Connection and disconnection is a normal part of operation, and not
exceptional, so change these to dprintk's for now. At some point
all of these will be converted to tracepoints, but that's for
another day.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Thu, 8 Jun 2017 15:52:52 +0000 (11:52 -0400)]
NFSv4.1: Use seqid returned by EXCHANGE_ID after state migration
Transparent State Migration copies a client's lease state from the
server where a filesystem used to reside to the server where it now
resides. When an NFSv4.1 client first contacts that destination
server, it uses EXCHANGE_ID to detect trunking relationships.
The lease that was copied there is returned to that client, but the
destination server sets EXCHGID4_FLAG_CONFIRMED_R when replying to
the client. This is because the lease was confirmed on the source
server (before it was copied).
When CONFIRMED_R is set, the client throws away the sequence ID
returned by the server. During a Transparent State Migration, however
there's no other way for the client to know what sequence ID to use
with a lease that's been migrated.
Therefore, the client must save and use the contrived slot sequence
value returned by the destination server even when CONFIRMED_R is
set.
Note that some servers always return a seqid of 1 after a migration.
Reported-by: Xuan Qi <xuan.qi@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Xuan Qi <xuan.qi@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Thu, 8 Jun 2017 15:52:44 +0000 (11:52 -0400)]
NFSv4.1: Handle EXCHGID4_FLAG_CONFIRMED_R during NFSv4.1 migration
Transparent State Migration copies a client's lease state from the
server where a filesystem used to reside to the server where it now
resides. When an NFSv4.1 client first contacts that destination
server, it uses EXCHANGE_ID to detect trunking relationships.
The lease that was copied there is returned to that client, but the
destination server sets EXCHGID4_FLAG_CONFIRMED_R when replying to
the client. This is because the lease was confirmed on the source
server (before it was copied).
Normally, when CONFIRMED_R is set, a client purges the lease and
creates a new one. However, that throws away the entire benefit of
Transparent State Migration.
Therefore, the client must not purge that lease when it is possible
that Transparent State Migration has occurred.
Reported-by: Xuan Qi <xuan.qi@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Xuan Qi <xuan.qi@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Thu, 8 Jun 2017 15:52:36 +0000 (11:52 -0400)]
xprtrdma: Don't defer MR recovery if ro_map fails
Deferred MR recovery does a DMA-unmapping of the MW. However, ro_map
invokes rpcrdma_defer_mr_recovery in some error cases where the MW
has not even been DMA-mapped yet.
Avoid a DMA-unmapping error replacing rpcrdma_defer_mr_recovery.
Also note that if ib_dma_map_sg is asked to map 0 nents, it will
return 0. So the extra "if (i == 0)" check is no longer needed.
Fixes:
42fe28f60763 ("xprtrdma: Do not leak an MW during a DMA ...")
Fixes:
505bbe64dd04 ("xprtrdma: Refactor MR recovery work queues")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Thu, 8 Jun 2017 15:52:28 +0000 (11:52 -0400)]
xprtrdma: Fix FRWR invalidation error recovery
When ib_post_send() fails, all LOCAL_INV WRs past @bad_wr have to be
examined, and the MRs reset by hand.
I'm not sure how the existing code can work by comparing R_keys.
Restructure the logic so that instead it walks the chain of WRs,
starting from the first bad one.
Make sure to wait for completion if at least one WR was actually
posted. Otherwise, if the ib_post_send fails, we can end up
DMA-unmapping the MR while LOCAL_INV operations are in flight.
Commit
7a89f9c626e3 ("xprtrdma: Honor ->send_request API contract")
added the rdma_disconnect() call site. The disconnect actually
causes more problems than it solves, and SQ overruns happen only as
a result of software bugs. So remove it.
Fixes:
d7a21c1bed54 ("xprtrdma: Reset MRs in frwr_op_unmap_sync()")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Thu, 8 Jun 2017 15:52:20 +0000 (11:52 -0400)]
xprtrdma: Fix client lock-up after application signal fires
After a signal, the RPC client aborts synchronous RPCs running on
behalf of the signaled application.
The server is still executing those RPCs, and will write the results
back into the client's memory when it's done. By the time the server
writes the results, that memory is likely being used for other
purposes. Therefore xprtrdma has to immediately invalidate all
memory regions used by those aborted RPCs to prevent the server's
writes from clobbering that re-used memory.
With FMR memory registration, invalidation takes a relatively long
time. In fact, the invalidation is often still running when the
server tries to write the results into the memory regions that are
being invalidated.
This sets up a race between two processes:
1. After the signal, xprt_rdma_free calls ro_unmap_safe.
2. While ro_unmap_safe is still running, the server replies and
rpcrdma_reply_handler runs, calling ro_unmap_sync.
Both processes invoke ib_unmap_fmr on the same FMR.
The mlx4 driver allows two ib_unmap_fmr calls on the same FMR at
the same time, but HCAs generally don't tolerate this. Sometimes
this can result in a system crash.
If the HCA happens to survive, rpcrdma_reply_handler continues. It
removes the rpc_rqst from rq_list and releases the transport_lock.
This enables xprt_rdma_free to run in another process, and the
rpc_rqst is released while rpcrdma_reply_handler is still waiting
for the ib_unmap_fmr call to finish.
But further down in rpcrdma_reply_handler, the transport_lock is
taken again, and "rqst" is dereferenced. If "rqst" has already been
released, this triggers a general protection fault. Since bottom-
halves are disabled, the system locks up.
Address both issues by reversing the order of the xprt_lookup_rqst
call and the ro_unmap_sync call. Introduce a separate lookup
mechanism for rpcrdma_req's to enable calling ro_unmap_sync before
xprt_lookup_rqst. Now the handler takes the transport_lock once
and holds it for the XID lookup and RPC completion.
BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=305
Fixes:
68791649a725 ('xprtrdma: Invalidate in the RPC reply ... ')
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Thu, 8 Jun 2017 15:52:12 +0000 (11:52 -0400)]
xprtrdma: Rename rpcrdma_req::rl_free
Clean up: I'm about to use the rl_free field for purposes other than
a free list. So use a more generic name.
This is a refactoring change only.
BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=305
Fixes:
68791649a725 ('xprtrdma: Invalidate in the RPC reply ... ')
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Thu, 8 Jun 2017 15:52:04 +0000 (11:52 -0400)]
xprtrdma: Pass only the list of registered MRs to ro_unmap_sync
There are rare cases where an rpcrdma_req can be re-used (via
rpcrdma_buffer_put) while the RPC reply handler is still running.
This is due to a signal firing at just the wrong instant.
Since commit
9d6b04097882 ("xprtrdma: Place registered MWs on a
per-req list"), rpcrdma_mws are self-contained; ie., they fully
describe an MR and scatterlist, and no part of that information is
stored in struct rpcrdma_req.
As part of closing the above race window, pass only the req's list
of registered MRs to ro_unmap_sync, rather than the rpcrdma_req
itself.
Some extra transport header sanity checking is removed. Since the
client depends on its own recollection of what memory had been
registered, there doesn't seem to be a way to abuse this change.
And, the check was not terribly effective. If the client had sent
Read chunks, the "list_empty" test is negative in both of the
removed cases, which are actually looking for Write or Reply
chunks.
BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=305
Fixes:
68791649a725 ('xprtrdma: Invalidate in the RPC reply ... ')
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Thu, 8 Jun 2017 15:51:56 +0000 (11:51 -0400)]
xprtrdma: Pre-mark remotely invalidated MRs
There are rare cases where an rpcrdma_req and its matched
rpcrdma_rep can be re-used, via rpcrdma_buffer_put, while the RPC
reply handler is still using that req. This is typically due to a
signal firing at just the wrong instant.
As part of closing this race window, avoid using the wrong
rpcrdma_rep to detect remotely invalidated MRs. Mark MRs as
invalidated while we are sure the rep is still OK to use.
BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=305
Fixes:
68791649a725 ('xprtrdma: Invalidate in the RPC reply ... ')
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Thu, 8 Jun 2017 15:51:48 +0000 (11:51 -0400)]
xprtrdma: On invalidation failure, remove MWs from rl_registered
Callers assume the ro_unmap_sync and ro_unmap_safe methods empty
the list of registered MRs. Ensure that all paths through
fmr_op_unmap_sync() remove MWs from that list.
Fixes:
9d6b04097882 ("xprtrdma: Place registered MWs on a ... ")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
NeilBrown [Mon, 3 Jul 2017 05:27:26 +0000 (15:27 +1000)]
NFS: check for nfs_refresh_inode() errors in nfs_fhget()
If an NFS server returns a filehandle that we have previously
seen, and reports a different type, then nfs_refresh_inode()
will log a warning and return an error.
nfs_fhget() does not check for this error and may return an
inode with a different type than the one that the server
reported.
This is likely to cause confusion, and is one way that
->open_context() could return a directory inode as discussed
in the previous patch.
So if nfs_refresh_inode() returns and error, return that error
from nfs_fhget() to avoid the confusion propagating.
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
NeilBrown [Mon, 3 Jul 2017 05:27:26 +0000 (15:27 +1000)]
NFS: guard against confused server in nfs_atomic_open()
A confused server could return a filehandle for an
NFSv4 OPEN request, which it previously returned for a directory.
So the inode returned by ->open_context() in nfs_atomic_open()
could conceivably be a directory inode.
This has particular implications for the call to
nfs_file_set_open_context() in nfs_finish_open().
If that is called on a directory inode, then the nfs_open_context
that gets stored in the filp->private_data will be linked to
nfs_inode->open_files.
When the directory is closed, nfs_closedir() will (ultimately)
free the ->private_data, but not unlink it from nfs_inode->open_files
(because it doesn't expect an nfs_open_context there).
Subsequently the memory could get used for something else and eventually
if the ->open_files list is walked, the walker will fall off the end and
crash.
So: change nfs_finish_open() to only call nfs_file_set_open_context()
for regular-file inodes.
This failure mode has been seen in a production setting (unknown NFS
server implementation). The kernel was v3.0 and the specific sequence
seen would not affect more recent kernels, but I think a risk is still
present, and caution is wise.
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
NeilBrown [Wed, 5 Jul 2017 02:22:20 +0000 (12:22 +1000)]
NFS: only invalidate dentrys that are clearly invalid.
Since commit
bafc9b754f75 ("vfs: More precise tests in d_invalidate")
in v3.18, a return of '0' from ->d_revalidate() will cause the dentry
to be invalidated even if it has filesystems mounted on or it or on a
descendant. The mounted filesystem is unmounted.
This means we need to be careful not to return 0 unless the directory
referred to truly is invalid. So -ESTALE or -ENOENT should invalidate
the directory. Other errors such a -EPERM or -ERESTARTSYS should be
returned from ->d_revalidate() so they are propagated to the caller.
A particular problem can be demonstrated by:
1/ mount an NFS filesystem using NFSv3 on /mnt
2/ mount any other filesystem on /mnt/foo
3/ ls /mnt/foo
4/ turn off network, or otherwise make the server unable to respond
5/ ls /mnt/foo &
6/ cat /proc/$!/stack # note that nfs_lookup_revalidate is in the call stack
7/ kill -9 $! # this results in -ERESTARTSYS being returned
8/ observe that /mnt/foo has been unmounted.
This patch changes nfs_lookup_revalidate() to only treat
-ESTALE from nfs_lookup_verify_inode() and
-ESTALE or -ENOENT from ->lookup()
as indicating an invalid inode. Other errors are returned.
Also nfs_check_inode_attributes() is changed to return -ESTALE rather
than -EIO. This is consistent with the error returned in similar
circumstances from nfs_update_inode().
As this bug allows any user to unmount a filesystem mounted on an NFS
filesystem, this fix is suitable for stable kernels.
Fixes:
bafc9b754f75 ("vfs: More precise tests in d_invalidate")
Cc: stable@vger.kernel.org (v3.18+)
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Olga Kornievskaia [Fri, 23 Jun 2017 14:26:59 +0000 (10:26 -0400)]
PNFS for stateid errors retry against MDS first
Upon receiving a stateid error such as BAD_STATEID, the client
should retry the operation against the MDS before deciding to
do stateid recovery.
Previously, the code would initiate state recovery and it could
lead to a race in a state manager that could chose an incorrect
recovery method which would lead to the EIO failure for the
application.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Olga Kornievskaia [Fri, 23 Jun 2017 14:26:58 +0000 (10:26 -0400)]
PNFS fix EACCESS on commit to DS handling
Commit
fabbbee0eb0f "PNFS fix fallback to MDS if got error on
commit to DS" moved the pnfs_set_lo_fail() to unhandled errors
which was not correct and lead to a kernel oops on umount.
Instead, fix the original EACCESS on commit to DS error by
getting the new layout and re-doing the IO.
Fixes:
fabbbee0eb0f ("PNFS fix fallback to MDS if got error on commit to DS")
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Cc: stable@vger.kernel.org # v4.12
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Dan Carpenter [Fri, 23 Jun 2017 15:16:25 +0000 (18:16 +0300)]
NFS: silence a uninitialized variable warning
Static checkers have gotten clever enough to complain that "id_long" is
uninitialized on the failure path. It's harmless, but simple to fix.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Tuo Chen Peng [Wed, 7 Jun 2017 06:42:44 +0000 (23:42 -0700)]
nfs: Fix fscache stat printing in nfs_show_stats()
nfs_show_stats() was incorrectly reading statistics for bytes when printing that
for fsc. It caused files like /proc/self/mountstats to report incorrect fsc
statistics for NFS mounts.
Signed-off-by: Tuo Chen Peng <tpeng@nvidia.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Benjamin Coddington [Fri, 9 Jun 2017 15:03:23 +0000 (11:03 -0400)]
NFS: Fix initialization of nfs_page_array->npages
Commit
8ef9b0b9e1c0 open-coded nfs_pgarray_set(), and left out the
initialization of the nfs_page_array's npages. This mistake didn't show up
until testing with block layouts, and there shows that all pNFS reads
return -EIO.
Fixes:
8ef9b0b9e1c0 ("NFS: move nfs_pgarray_set() to open code")
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Cc: stable@vger.kernel.org # 4.12
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Trond Myklebust [Tue, 20 Jun 2017 23:35:38 +0000 (19:35 -0400)]
NFS: Fix commit policy for non-blocking calls to nfs_write_inode()
Now that the writes will schedule a commit on their own, we don't
need nfs_write_inode() to schedule one if there are outstanding
writes, and we're being called in non-blocking mode.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Trond Myklebust [Tue, 20 Jun 2017 23:35:37 +0000 (19:35 -0400)]
NFS: Ensure we commit after writeback is complete
If the page cache is being flushed, then we want to ensure that we
do start a commit once the pages are done being flushed.
If we just wait until all I/O is done to that file, we can end up
livelocking until the balance_dirty_pages() mechanism puts its
foot down and forces I/O to stop.
So instead we do more or less the same thing that O_DIRECT does,
and set up a counter to tell us when the flush is done,
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Trond Myklebust [Tue, 20 Jun 2017 23:35:36 +0000 (19:35 -0400)]
NFS: Remove unused fields in the page I/O structures
Remove the 'layout_private' fields that were only used by the pNFS OSD
layout driver.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Trond Myklebust [Tue, 20 Jun 2017 23:35:39 +0000 (19:35 -0400)]
SUNRPC: Make slot allocation more reliable
In xprt_alloc_slot(), the spin lock is only needed to provide atomicity
between the atomic_add_unless() failure and the call to xprt_add_backlog().
We do not actually need to hold it across the memory allocation itself.
By dropping the lock, we can use a more resilient GFP_NOFS allocation,
just as we now do in the rest of the RPC client code.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Benjamin Coddington [Fri, 16 Jun 2017 15:13:00 +0000 (11:13 -0400)]
NFS: nfs_rename() - revalidate directories on -ERESTARTSYS
An interrupted rename will leave the old dentry behind if the rename
succeeds. Fix this by forcing a lookup the next time through
->d_revalidate.
A previous attempt at solving this problem took the approach to complete
the work of the rename asynchronously, however that approach was wrong
since it would allow the d_move() to occur after the directory's i_mutex
had been dropped by the original process.
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Benjamin Coddington [Tue, 20 Jun 2017 12:33:44 +0000 (08:33 -0400)]
NFS: convert flags to bool
NFS uses some int, and unsigned int :1, and bool as flags in structs and
args. Assert the preference for uniformly replacing these with the bool
type.
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Anna Schumaker [Fri, 16 Jun 2017 16:06:59 +0000 (12:06 -0400)]
NFS: Set FATTR4_WORD0_TYPE for . and .. entries
The current code worked okay for getdents(), but getdents64() expects
the d_type field to get filled out properly in the stat structure.
Setting this field fixes xfstests generic/401.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Christoph Hellwig [Mon, 8 May 2017 21:46:47 +0000 (23:46 +0200)]
nfsd4: const-ify nfsd4_ops
nfsd4_ops contains function pointers, and marking it as constant avoids
it being able to be used as an attach vector for code injections.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Christoph Hellwig [Fri, 12 May 2017 14:21:37 +0000 (16:21 +0200)]
sunrpc: mark all struct svc_version instances as const
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Trond Myklebust <trond.myklebust@primarydata.com>
Christoph Hellwig [Fri, 12 May 2017 14:11:49 +0000 (16:11 +0200)]
sunrpc: mark all struct svc_procinfo instances as const
struct svc_procinfo contains function pointers, and marking it as
constant avoids it being able to be used as an attach vector for
code injections.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Christoph Hellwig [Mon, 8 May 2017 21:40:27 +0000 (23:40 +0200)]
sunrpc: move pc_count out of struct svc_procinfo
pc_count is the only writeable memeber of struct svc_procinfo, which is
a good candidate to be const-ified as it contains function pointers.
This patch moves it into out out struct svc_procinfo, and into a
separate writable array that is pointed to by struct svc_version.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Christoph Hellwig [Mon, 8 May 2017 18:58:35 +0000 (20:58 +0200)]
nfsd4: properly type op_func callbacks
Pass union nfsd4_op_u to the op_func callbacks instead of using unsafe
function pointer casts.
It also adds two missing structures to struct nfsd4_op.u to facilitate
this.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Christoph Hellwig [Mon, 8 May 2017 18:42:10 +0000 (20:42 +0200)]
nfsd4: remove nfsd4op_rsize
Except for a lot of unnecessary casts this typedef only has one user,
so remove the casts and expand it in struct nfsd4_operation.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Christoph Hellwig [Mon, 8 May 2017 18:37:33 +0000 (20:37 +0200)]
nfsd4: properly type op_get_currentstateid callbacks
Pass union nfsd4_op_u to the op_set_currentstateid callbacks instead of
using unsafe function pointer casts.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Christoph Hellwig [Mon, 8 May 2017 18:03:15 +0000 (20:03 +0200)]
nfsd4: properly type op_set_currentstateid callbacks
Given the args union in struct nfsd4_op a name, and pass it to the
op_set_currentstateid callbacks instead of using unsafe function
pointer casts.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Christoph Hellwig [Mon, 8 May 2017 17:56:10 +0000 (19:56 +0200)]
sunrpc: remove kxdrproc_t
Remove the now unused typedef.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Christoph Hellwig [Mon, 8 May 2017 17:42:02 +0000 (19:42 +0200)]
sunrpc: properly type pc_encode callbacks
Drop the resp argument as it can trivially be derived from the rqstp
argument. With that all functions now have the same prototype, and we
can remove the unsafe casting to kxdrproc_t.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Trond Myklebust <trond.myklebust@primarydata.com>
Christoph Hellwig [Mon, 8 May 2017 17:01:48 +0000 (19:01 +0200)]
sunrpc: properly type pc_decode callbacks
Drop the argp argument as it can trivially be derived from the rqstp
argument. With that all functions now have the same prototype, and we
can remove the unsafe casting to kxdrproc_t.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Christoph Hellwig [Mon, 8 May 2017 16:48:24 +0000 (18:48 +0200)]
sunrpc: properly type pc_release callbacks
Drop the p and resp arguments as they are always NULL or can trivially
be derived from the rqstp argument. With that all functions now have the
same prototype, and we can remove the unsafe casting to kxdrproc_t.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Christoph Hellwig [Mon, 8 May 2017 15:35:49 +0000 (17:35 +0200)]
sunrpc: properly type pc_func callbacks
Drop the argp and resp arguments as they can trivially be derived from
the rqstp argument. With that all functions now have the same prototype,
and we can remove the unsafe casting to svc_procfunc as well as the
svc_procfunc typedef itself.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Christoph Hellwig [Mon, 8 May 2017 16:03:06 +0000 (18:03 +0200)]
nfsd: remove the unused PROC() macro in nfs3proc.c
Signed-off-by: Christoph Hellwig <hch@lst.de>
Christoph Hellwig [Mon, 8 May 2017 15:59:13 +0000 (17:59 +0200)]
nfsd: use named initializers in PROC()
Signed-off-by: Christoph Hellwig <hch@lst.de>
Christoph Hellwig [Fri, 12 May 2017 13:58:06 +0000 (15:58 +0200)]
nfsd4: const-ify nfs_cb_version4
Signed-off-by: Christoph Hellwig <hch@lst.de>
Christoph Hellwig [Fri, 12 May 2017 13:36:49 +0000 (15:36 +0200)]
sunrpc: mark all struct rpc_procinfo instances as const
struct rpc_procinfo contains function pointers, and marking it as
constant avoids it being able to be used as an attach vector for
code injections.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Trond Myklebust <trond.myklebust@primarydata.com>
Christoph Hellwig [Fri, 12 May 2017 13:51:24 +0000 (15:51 +0200)]
nfs: use ARRAY_SIZE() in the nfsacl_version3 declaration
Signed-off-by: Christoph Hellwig <hch@lst.de>
Christoph Hellwig [Mon, 8 May 2017 21:27:10 +0000 (23:27 +0200)]
sunrpc: move p_count out of struct rpc_procinfo
p_count is the only writeable memeber of struct rpc_procinfo, which is
a good candidate to be const-ified as it contains function pointers.
This patch moves it into out out struct rpc_procinfo, and into a
separate writable array that is pointed to by struct rpc_version and
indexed by p_statidx.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Christoph Hellwig [Mon, 8 May 2017 21:32:18 +0000 (23:32 +0200)]
lockd: fix some weird indentation
Remove double indentation of a few struct rpc_version and
struct rpc_program instance.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Trond Myklebust <trond.myklebust@primarydata.com>
Christoph Hellwig [Thu, 11 May 2017 07:22:18 +0000 (09:22 +0200)]
nfs: don't cast callback decode/proc/encode routines
Instead declare all functions with the proper methods signature.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Trond Myklebust <trond.myklebust@primarydata.com>
Christoph Hellwig [Mon, 8 May 2017 13:09:02 +0000 (15:09 +0200)]
nfs: fix decoder callback prototypes
Declare the p_decode callbacks with the proper prototype instead of
casting to kxdrdproc_t and losing all type safety.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Trond Myklebust <trond.myklebust@primarydata.com>
Christoph Hellwig [Mon, 8 May 2017 13:06:20 +0000 (15:06 +0200)]
lockd: fix decoder callback prototypes
Declare the p_decode callbacks with the proper prototype instead of
casting to kxdrdproc_t and losing all type safety.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Trond Myklebust <trond.myklebust@primarydata.com>
Christoph Hellwig [Mon, 8 May 2017 13:04:45 +0000 (15:04 +0200)]
nfsd: fix decoder callback prototypes
Declare the p_decode callbacks with the proper prototype instead of
casting to kxdrdproc_t and losing all type safety.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Christoph Hellwig [Mon, 8 May 2017 13:03:02 +0000 (15:03 +0200)]
sunrpc/auth_gss: fix decoder callback prototypes
Declare the p_decode callbacks with the proper prototype instead of
casting to kxdrdproc_t and losing all type safety.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Trond Myklebust <trond.myklebust@primarydata.com>
Christoph Hellwig [Mon, 8 May 2017 13:00:29 +0000 (15:00 +0200)]
sunrpc: fix decoder callback prototypes
Declare the p_decode callbacks with the proper prototype instead of
casting to kxdrdproc_t and losing all type safety.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Christoph Hellwig [Mon, 8 May 2017 12:58:11 +0000 (14:58 +0200)]
sunrpc: properly type argument to kxdrdproc_t
Pass struct rpc_request as the first argument instead of an untyped blob.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Trond Myklebust <trond.myklebust@primarydata.com>
Christoph Hellwig [Mon, 8 May 2017 12:54:06 +0000 (14:54 +0200)]
sunrpc/auth_gss: nfsd: fix encoder callback prototypes
Declare the p_encode callbacks with the proper prototype instead of
casting to kxdreproc_t and losing all type safety.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Trond Myklebust <trond.myklebust@primarydata.com>
Christoph Hellwig [Mon, 8 May 2017 12:47:53 +0000 (14:47 +0200)]
nfsd: fix encoder callback prototypes
Declare the p_encode callbacks with the proper prototype instead of
casting to kxdreproc_t and losing all type safety.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Christoph Hellwig [Mon, 8 May 2017 08:01:49 +0000 (10:01 +0200)]
nfs: fix encoder callback prototypes
Declare the p_encode callbacks with the proper prototype instead of
casting to kxdreproc_t and losing all type safety.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Trond Myklebust <trond.myklebust@primarydata.com>
Christoph Hellwig [Mon, 8 May 2017 07:34:04 +0000 (09:34 +0200)]
lockd: fix encoder callback prototypes
Declare the p_encode callbacks with the proper prototype instead of
casting to kxdreproc_t and losing all type safety.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Trond Myklebust <trond.myklebust@primarydata.com>
Christoph Hellwig [Mon, 8 May 2017 07:59:45 +0000 (09:59 +0200)]
sunrpc: fix encoder callback prototypes
Declare the p_encode callbacks with the proper prototype instead of
casting to kxdreproc_t and losing all type safety.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Trond Myklebust <trond.myklebust@primarydata.com>
Christoph Hellwig [Mon, 8 May 2017 07:31:19 +0000 (09:31 +0200)]
sunrpc: properly type argument to kxdreproc_t
Pass struct rpc_request as the first argument instead of an untyped blob,
and mark the data object as const.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Linus Torvalds [Thu, 13 Jul 2017 19:38:49 +0000 (12:38 -0700)]
Merge branch 'akpm' (patches from Andrew)
Merge yet more updates from Andrew Morton:
- various misc things
- kexec updates
- sysctl core updates
- scripts/gdb udpates
- checkpoint-restart updates
- ipc updates
- kernel/watchdog updates
- Kees's "rough equivalent to the glibc _FORTIFY_SOURCE=1 feature"
- "stackprotector: ascii armor the stack canary"
- more MM bits
- checkpatch updates
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (96 commits)
writeback: rework wb_[dec|inc]_stat family of functions
ARM: samsung: usb-ohci: move inline before return type
video: fbdev: omap: move inline before return type
video: fbdev: intelfb: move inline before return type
USB: serial: safe_serial: move __inline__ before return type
drivers: tty: serial: move inline before return type
drivers: s390: move static and inline before return type
x86/efi: move asmlinkage before return type
sh: move inline before return type
MIPS: SMP: move asmlinkage before return type
m68k: coldfire: move inline before return type
ia64: sn: pci: move inline before type
ia64: move inline before return type
FRV: tlbflush: move asmlinkage before return type
CRIS: gpio: move inline before return type
ARM: HP Jornada 7XX: move inline before return type
ARM: KVM: move asmlinkage before type
checkpatch: improve the STORAGE_CLASS test
mm, migration: do not trigger OOM killer when migrating memory
drm/i915: use __GFP_RETRY_MAYFAIL
...
Linus Torvalds [Thu, 13 Jul 2017 19:28:06 +0000 (12:28 -0700)]
Merge tag 'platform-drivers-x86-v4.13-2' of git://git.infradead.org/linux-platform-drivers-x86
Pull more x86 platform driver updates from Darren Hart:
"Add new platform matches for silead_dmi and ideapad-laptop. Several
constify patches for attribute_group structures. Fixes for peaq-wmi
and intel_telemetry.
silead_dmi:
- Add entry for Ployer Momo7w tablet touchscreen
- Add touchscreen info for I.T.Works TW891 2-in-1
toshiba_acpi:
- constify attribute_group structures.
asus-wmi:
- constify attribute_group structures.
panasonic-laptop:
- constify attribute_group structures.
alienware-wmi:
- constify attribute_group structures.
samsung-laptop:
- constify attribute_group structures.
compal-laptop:
- constify attribute_group structures.
fujitsu-laptop:
- constify attribute_group structures.
- add NULL check on devm_kzalloc() return value
peaq-wmi:
- Fix peaq_ignore_events_counter handling off by 1
ideapad-laptop:
- Fix indentation in DMI table
- Add several models to no_hw_rfkill
- Add IdeaPad V510-15IKB to no_hw_rfkill
intel_telemetry:
- Add debugfs entry for S0ix residency
intel_telemetry_debugfs:
- fix some error codes in init
- fix oops when load/unload module"
* tag 'platform-drivers-x86-v4.13-2' of git://git.infradead.org/linux-platform-drivers-x86:
platform/x86: silead_dmi: Add entry for Ployer Momo7w tablet touchscreen
platform/x86: toshiba_acpi: constify attribute_group structures.
platform/x86: asus-wmi: constify attribute_group structures.
platform/x86: panasonic-laptop: constify attribute_group structures.
platform/x86: alienware-wmi: constify attribute_group structures.
platform/x86: samsung-laptop: constify attribute_group structures.
platform/x86: compal-laptop: constify attribute_group structures.
platform/x86: fujitsu-laptop: constify attribute_group structures.
platform/x86: peaq-wmi: Fix peaq_ignore_events_counter handling off by 1
platform/x86: fujitsu-laptop: add NULL check on devm_kzalloc() return value
platform/x86: silead_dmi: Add touchscreen info for I.T.Works TW891 2-in-1
platform/x86: ideapad-laptop: Fix indentation in DMI table
platform/x86: ideapad-laptop: Add several models to no_hw_rfkill
platform/x86: ideapad-laptop: Add IdeaPad V510-15IKB to no_hw_rfkill
platform/x86: intel_telemetry: Add debugfs entry for S0ix residency
platform/x86: intel_telemetry_debugfs: fix some error codes in init
platform/x86: intel_telemetry_debugfs: fix oops when load/unload module
Linus Torvalds [Thu, 13 Jul 2017 19:23:54 +0000 (12:23 -0700)]
Merge tag 'vfio-v4.13-rc1' of git://github.com/awilliam/linux-vfio
Pull VFIO updates from Alex Williamson:
- Include Intel XXV710 in INTx workaround (Alex Williamson)
- Make use of ERR_CAST() for error return (Dan Carpenter)
- Fix vfio_group release deadlock from iommu notifier (Alex Williamson)
- Unset KVM-VFIO attributes only on group match (Alex Williamson)
- Fix release path group/file matching with KVM-VFIO (Alex Williamson)
- Remove unnecessary lock uses triggering lockdep splat (Alex Williamson)
* tag 'vfio-v4.13-rc1' of git://github.com/awilliam/linux-vfio:
vfio: Remove unnecessary uses of vfio_container.group_lock
vfio: New external user group/file match
kvm-vfio: Decouple only when we match a group
vfio: Fix group release deadlock
vfio: Use ERR_CAST() instead of open coding it
vfio/pci: Add Intel XXV710 to hidden INTx devices
Linus Torvalds [Thu, 13 Jul 2017 19:15:06 +0000 (12:15 -0700)]
Merge tag 'rtc-4.13' of git://git./linux/kernel/git/abelloni/linux
Pull RTC updates from Alexandre Belloni:
"Here is the pull-request for the RTC subsystem for 4.13.
Subsystem:
- expose non volatile RAM using nvmem instead of open coding in many
drivers. Unfortunately, this option has to be enabled by default to
not break existing users.
- rtctest can now test for cutoff dates, showing when an RTC will
start failing to properly save time and date.
- new RTC registration functions to remove race conditions in drivers
Newly supported RTCs:
- Broadcom STB wake-timer
- Epson RX8130CE
- Maxim IC DS1308
- STMicroelectronics STM32H7
Drivers:
- ds1307: use regmap, use nvmem, more cleanups
- ds3232: temperature reading support
- gemini: renamed to ftrtc010
- m41t80: use CCF to expose the clock
- rv8803: use nvmem
- s3c: many cleanups
- st-lpc: fix y2106 bug"
* tag 'rtc-4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux: (51 commits)
rtc: Remove wrong deprecation comment
nvmem: include linux/err.h from header
rtc: st-lpc: make it robust against y2038/2106 bug
rtc: rtctest: add check for problematic dates
tools: timer: add rtctest_setdate
rtc: ds1307: remove ds1307_remove
rtc: ds1307: use generic nvmem
rtc: ds1307: switch to rtc_register_device
rtc: rv8803: remove rv8803_remove
rtc: rv8803: use generic nvmem support
rtc: rv8803: switch to rtc_register_device
rtc: add generic nvmem support
rtc: at91rm9200: remove race condition
rtc: introduce new registration method
rtc: class separate id allocation from registration
rtc: class separate device allocation from registration
rtc: stm32: add STM32H7 RTC support
dt-bindings: rtc: stm32: add support for STM32H7
rtc: ds1307: add ds1308 variant
rtc: ds3232: add temperature support
...
Linus Torvalds [Thu, 13 Jul 2017 19:07:44 +0000 (12:07 -0700)]
Merge tag 'for-linus-
20170713' of git://git.infradead.org/linux-mtd
Pull MTD updates from Brian Norris:
"General updates:
- Cleanups and additional flash support for "dataflash" driver
- new driver for mchp23k256 SPI SRAM device
- improve handling of MTDs without eraseblocks (i.e., MTD_NO_ERASE)
- refactor and improve "sub-partition" handling with TRX partition
parser; partitions can now be created as sub-partitions of another
partition
SPINOR updates, from Cyrille Pitchen and Marek Vasut:
- introduce support to the SPI 1-2-2 and 1-4-4 protocols.
- introduce support to the Double Data Rate (DDR) mode.
- introduce support to the Octo SPI protocols.
- add support to new memory parts for Spansion, Macronix and Winbond.
- add fixes for the Aspeed, STM32 and Cadence QSPI controler drivers.
- clean up the st_spi_fsm driver.
NAND updates, from Boris Brezillon:
- addition of on-die ECC support to Micron driver
- addition of helpers to help drivers choose most appropriate ECC
settings
- deletion of dead-code (cached programming and ->errstat() hook)
- make sure drivers that do not support the SET/GET FEATURES command
return ENOTSUPP use a dummy ->set/get_features implementation
returning -ENOTSUPP (required for Micron on-die ECC)
- change the semantic of ecc->write_page() for drivers setting the
NAND_ECC_CUSTOM_PAGE_ACCESS flag
- support exiting 'GET STATUS' command in default ->cmdfunc()
implementations
- change the prototype of ->setup_data_interface()
A bunch of driver related changes:
- various cleanup, fixes and improvements of the MTK driver
- OMAP DT bindings fixes
- support for ->setup_data_interface() in the fsmc driver
- support for imx7 in the gpmi driver
- finalization of the denali driver rework (thanks to Masahiro for
the work he's done on this driver)
- fix "bitflips in erased pages" handling in the ifc driver
- addition of PM ops and dynamic timing configuration to the atmel
driver"
* tag 'for-linus-
20170713' of git://git.infradead.org/linux-mtd: (118 commits)
Documentation: ABI: mtd: describe "offset" more precisely
mtd: Fix check in mtd_unpoint()
mtd: nand: mtk: release lock on error path
mtd: st_spi_fsm: remove SPINOR_OP_RDSR2 and use SPINOR_OP_RDCR instead
mtd: spi-nor: cqspi: remove duplicate const
mtd: spi-nor: Add support for Spansion S25FL064L
mtd: spi-nor: Add support for mx66u51235f
mtd: nand: mtk: add ->setup_data_interface() hook
mtd: nand: mtk: remove unneeded mtk_ecc_hw_init from mtk_ecc_resume
mtd: nand: mtk: remove unneeded mtk_nfc_hw_init from mtk_nfc_resume
mtd: nand: mtk: disable ecc irq when writing page with hwecc
mtd: nand: mtk: fix incorrect register setting order about ecc irq
mtd: partitions: fixup some allocate_partition() whitespace
mtd: parsers: trx: fix pr_err format for printing offset
MAINTAINERS: Update SPI NOR subsystem git repositories
mtd: extract TRX parser out of bcm47xxpart into a separated module
mtd: partitions: add support for partition parsers
mtd: partitions: add support for subpartitions
mtd: partitions: rename "master" to the "parent" where appropriate
mtd: partitions: remove sysfs files when deleting all master's partitions
...
Linus Torvalds [Thu, 13 Jul 2017 18:52:00 +0000 (11:52 -0700)]
Merge tag 'fbdev-v4.13' of git://github.com/bzolnier/linux
Pull fbdev updates from Bartlomiej Zolnierkiewicz:
"There is nothing really major here, just a couple of small bugfixes,
improvements and cleanups.
- fix get_fb_unmapped_area() helper handling (Benjamin Gaignard)
- check return value of clk_prepare_enable() in pxafb driver (Arvind
Yadav)
- fix error path handling in vmlfb driver (Alexey Khoroshilov)
- printks fixes/cleanups for uvesafb driver (Joe Perches)
- fix unusued variable warning in atyfb driver (Arnd Bergmann)
- constifications for sh_mobile_lcdcfb, fsl-diu-fb, omapfb (Arvind
Yadav)
- mdacon driver cleanups (Jiri Slaby)
- misc cleanups (Andy Shevchenko, Karim Eshapa, Gustavo A. R. Silva,
Dan Carpenter)"
* tag 'fbdev-v4.13' of git://github.com/bzolnier/linux:
fbdev: make get_fb_unmapped_area depends of !MMU
atyfb: hide unused variable
video: fbdev: matrox: the list iterator can't be NULL
video: fbdev: aty: remove useless variable assignments in aty_var_to_crtc()
fbdev: omapfb: constify ctrl_caps, color_caps, panel_attr_grp and ctrl_attr_grp
omapfb: panel-dsi-cm: constify dsicm_attr_group
vmlfb: Fix error handling in cr_pll_init()
video: fbdev: fsl-diu-fb: constify mfb_template and fsl_diu_match.
uvesafb: Fix continuation printks without KERN_LEVEL to pr_cont, neatening
video: fbdev: sh_mobile_lcdcfb: constify sh_mobile_lcdc_bl_ops.
omapfb: Use sysfs_match_string() helper
video: fbdev: pxafb: Handle return value of clk_prepare_enable
video: fbdev: omap2: omapfb: displays: panel-dsi-cm: Use time comparison kernel macro.
mdacon: replace MDA_ADDR macro by inline function
mdacon: make mda_vram_base u16 *
mdacon: align code in mda_detect properly
Linus Torvalds [Thu, 13 Jul 2017 18:49:52 +0000 (11:49 -0700)]
Merge tag 'pwm/for-4.13-rc1' of git://git./linux/kernel/git/thierry.reding/linux-pwm
Pull pwm updates from Thierry Reding:
"This release cycle's changes include mostly updates and cleanups to
existing drivers along with a few cleanups to the core, documentation
and device tree bindings"
* tag 'pwm/for-4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm:
pwm: cros-ec: Fix transposed param settings
pwm: meson: Improve PWM calculation precision
dt-bindings: pwm: meson: Add compatible for gxbb ao PWMs
pwm: meson: Add compatible for the gxbb ao PWMs
pwm: sun4i: Drop legacy callbacks
pwm: sun4i: Switch to atomic PWM
pwm: sun4i: Improve hardware read out
pwm: hibvt: Constify hibvt_pwm_ops
pwm: Silently error out on EPROBE_DEFER
pwm: Standardize document format
pwm: bfin: Remove unneeded error message
dt-bindings: pwm: Update STM32 timers clock names
dt-bindings: pwm: Add R-Car M3-W device tree bindings
pwm: tegra: Set maximum pwm clock source per SoC tapeout
Linus Torvalds [Thu, 13 Jul 2017 18:47:59 +0000 (11:47 -0700)]
Merge tag 'for-v4.13-2' of git://git./linux/kernel/git/sre/linux-power-supply
Pull power supply fixes from Reichel:
"I have two more fixes for the power-supply subsystem:
- two fixes for twl4030-charger"
* tag 'for-v4.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply:
power: supply: twl4030-charger: add deferred probing for phy and iio
power: supply: twl4030-charger: move irq allocation to just before irqs are enabled
Linus Torvalds [Thu, 13 Jul 2017 18:26:18 +0000 (11:26 -0700)]
Merge tag 'drm-fixes-for-v4.13-rc1' of git://people.freedesktop.org/~airlied/linux
Pull more drm updates from Dave Airlie:
"i915, amd and some core fixes + mediatek color support.
Some fixes tree came in since the main pull request for rc1, primarily
i915 and drm-misc and one amd fix. The drm core vblank regression fix
is probably the most important thing.
I've also added the mediatek feature pull, it wasn't that big and
didn't look like it would have any impact outside of mediatek, in fact
it looks to just be a single feature, and some cleanups"
* tag 'drm-fixes-for-v4.13-rc1' of git://people.freedesktop.org/~airlied/linux: (31 commits)
drm/i915: Make DP-MST connector info work
drm/i915/gvt: Use fence error from GVT request for workload status
drm/i915/gvt: remove scheduler_mutex in per-engine workload_thread
drm/i915/gvt: Revert "drm/i915/gvt: Fix possible recursive locking issue"
drm/i915/gvt: Audit the command buffer address
drm/i915/gvt: Fix a memory leak in intel_gvt_init_gtt()
drm/rockchip: fix NULL check on devm_kzalloc() return value
drm/i915/fbdev: Check for existence of ifbdev->vma before operations
drm/radeon: Fix eDP for single-display iMac10,1 (v2)
drm/i915: Hold RPM wakelock while initializing OA buffer
drm/i915/cnl: Fix the CURSOR_COEFF_MASK used in DDI Vswing Programming
drm/i915/cfl: Fix Workarounds.
drm/i915: Avoid undefined behaviour of "u32 >> 32"
drm/i915: reintroduce VLV/CHV PFI programming power domain workaround
drm/i915: Fix an error checking test
drm/i915: Disable MSI for all pre-gen5
drm/atomic: Add missing drm_atomic_state_clear to atomic_remove_fb
drm: vblank: Fix vblank timestamp update
drm/i915/gvt: Make function dpy_reg_mmio_readx safe
drm/mediatek: separate color module to fixup error memory reallocation
...
Jeffy Chen [Wed, 12 Jul 2017 06:18:32 +0000 (14:18 +0800)]
drm: Add missing field copy in compat_drm_version
DRM_IOCTL_VERSION is supposed to update the name_len/date_len/desc_len
fields to user.
Fixes:
012c6741c6aa ("switch compat_drm_version() to drm_ioctl_kernel()")
Signed-off-by: Jeffy Chen <jeffy.chen@rock-chips.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Colin Ian King [Thu, 13 Jul 2017 17:51:15 +0000 (18:51 +0100)]
svcrdma: fix an incorrect check on -E2BIG and -EINVAL
The current check will always be true and will always jump to
err1, this looks dubious to me. I believe && should be used
instead of ||.
Detected by CoverityScan, CID#
1450120 ("Logically Dead Code")
Fixes:
107c1d0a991a ("svcrdma: Avoid Send Queue overflow")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Rafał Miłecki [Sun, 25 Jun 2017 11:11:54 +0000 (13:11 +0200)]
Documentation: ABI: mtd: describe "offset" more precisely
So far Linux supported only two levels of MTD devices so we didn't need
a very precise description for this sysfs file. With commit
97519dc52b44a ("mtd: partitions: add support for subpartitions") there
is support for a tree structure so we should have more precise
description. Using "parent" and "flash device" makes it more accurate.
Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Signed-off-by: Brian Norris <computersforpeace@gmail.com>