nfs: take extra reference to fl->fl_file when running a setlk
We had a report of a crash while stress testing the NFS client:
BUG: unable to handle kernel NULL pointer dereference at
0000000000000150
IP: [<
ffffffff8127b698>] locks_get_lock_context+0x8/0x90
PGD 0
Oops: 0000 [#1] SMP
Modules linked in: rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat ebtable_filter ebtable_broute bridge stp llc ebtables ip6table_security ip6table_mangle ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_raw ip6table_filter ip6_tables iptable_security iptable_mangle iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_raw coretemp crct10dif_pclmul ppdev crc32_pclmul crc32c_intel ghash_clmulni_intel vmw_balloon serio_raw vmw_vmci i2c_piix4 shpchp parport_pc acpi_cpufreq parport nfsd auth_rpcgss nfs_acl lockd grace sunrpc vmwgfx drm_kms_helper ttm drm mptspi scsi_transport_spi mptscsih mptbase e1000 ata_generic pata_acpi
CPU: 1 PID: 399 Comm: kworker/1:1H Not tainted 4.1.0-0.rc1.git0.1.fc23.x86_64 #1
Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/30/2013
Workqueue: rpciod rpc_async_schedule [sunrpc]
task:
ffff880036aea7c0 ti:
ffff8800791f4000 task.ti:
ffff8800791f4000
RIP: 0010:[<
ffffffff8127b698>] [<
ffffffff8127b698>] locks_get_lock_context+0x8/0x90
RSP: 0018:
ffff8800791f7c00 EFLAGS:
00010293
RAX:
ffff8800791f7c40 RBX:
ffff88001f2ad8c0 RCX:
ffffe8ffffc80305
RDX:
0000000000000000 RSI:
0000000000000000 RDI:
0000000000000000
RBP:
ffff8800791f7c88 R08:
ffff88007fc971d8 R09:
279656d600000000
R10:
0000034a01000000 R11:
279656d600000000 R12:
ffff88001f2ad918
R13:
ffff88001f2ad8c0 R14:
0000000000000000 R15:
0000000100e73040
FS:
0000000000000000(0000) GS:
ffff88007fc80000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
0000000000000150 CR3:
0000000001c0b000 CR4:
00000000000407e0
Stack:
ffffffff8127c5b0 ffff8800791f7c18 ffffffffa0171e29 ffff8800791f7c58
ffffffffa0171ef8 ffff8800791f7c78 0000000000000246 ffff88001ea0ba00
ffff8800791f7c40 ffff8800791f7c40 00000000ff5d86a3 ffff8800791f7ca8
Call Trace:
[<
ffffffff8127c5b0>] ? __posix_lock_file+0x40/0x760
[<
ffffffffa0171e29>] ? rpc_make_runnable+0x99/0xa0 [sunrpc]
[<
ffffffffa0171ef8>] ? rpc_wake_up_task_queue_locked.part.35+0xc8/0x250 [sunrpc]
[<
ffffffff8127cd3a>] posix_lock_file_wait+0x4a/0x120
[<
ffffffffa03e4f12>] ? nfs41_wake_and_assign_slot+0x32/0x40 [nfsv4]
[<
ffffffffa03bf108>] ? nfs41_sequence_done+0xd8/0x2d0 [nfsv4]
[<
ffffffffa03c116d>] do_vfs_lock+0x2d/0x30 [nfsv4]
[<
ffffffffa03c251d>] nfs4_lock_done+0x1ad/0x210 [nfsv4]
[<
ffffffffa0171a30>] ? __rpc_sleep_on_priority+0x390/0x390 [sunrpc]
[<
ffffffffa0171a30>] ? __rpc_sleep_on_priority+0x390/0x390 [sunrpc]
[<
ffffffffa0171a5c>] rpc_exit_task+0x2c/0xa0 [sunrpc]
[<
ffffffffa0167450>] ? call_refreshresult+0x150/0x150 [sunrpc]
[<
ffffffffa0172640>] __rpc_execute+0x90/0x460 [sunrpc]
[<
ffffffffa0172a25>] rpc_async_schedule+0x15/0x20 [sunrpc]
[<
ffffffff810baa1b>] process_one_work+0x1bb/0x410
[<
ffffffff810bacc3>] worker_thread+0x53/0x480
[<
ffffffff810bac70>] ? process_one_work+0x410/0x410
[<
ffffffff810bac70>] ? process_one_work+0x410/0x410
[<
ffffffff810c0b38>] kthread+0xd8/0xf0
[<
ffffffff810c0a60>] ? kthread_worker_fn+0x180/0x180
[<
ffffffff817a1aa2>] ret_from_fork+0x42/0x70
[<
ffffffff810c0a60>] ? kthread_worker_fn+0x180/0x180
Jean says:
"Running locktests with a large number of iterations resulted in a
client crash. The test run took a while and hasn't finished after close
to 2 hours. The crash happened right after I gave up and killed the test
(after 107m) with Ctrl+C."
The crash happened because a NULL inode pointer got passed into
locks_get_lock_context. The call chain indicates that file_inode(filp)
returned NULL, which means that f_inode was NULL. Since that's zeroed
out in __fput, that suggests that this filp pointer outlived the last
reference.
Looking at the code, that seems possible. We copy the struct file_lock
that's passed in, but if the task is signalled at an inopportune time we
can end up trying to use that file_lock in rpciod context after the process
that requested it has already returned (and possibly put its filp
reference).
Fix this by taking an extra reference to the filp when we allocate the
lock info, and put it in nfs4_lock_release.
Reported-by: Jean Spector <jean@primarydata.com>
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>