GitHub/moto-9609/android_kernel_motorola_exynos9610.git
9 years agonfs42: remove unused declaration
Peng Tao [Tue, 25 Aug 2015 16:13:16 +0000 (00:13 +0800)]
nfs42: remove unused declaration

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
9 years agonfs42: decode_layoutstats does not need res parameter
Peng Tao [Tue, 25 Aug 2015 16:13:15 +0000 (00:13 +0800)]
nfs42: decode_layoutstats does not need res parameter

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
9 years agoNFSv4.1/flexfiles: Allow coalescing of new layout segments and existing ones
Trond Myklebust [Tue, 25 Aug 2015 21:38:25 +0000 (17:38 -0400)]
NFSv4.1/flexfiles: Allow coalescing of new layout segments and existing ones

In order to ensure atomicity of updates, we merge the old layout segments
into the new ones, and then invalidate the old ones.

Also ensure that we order the list of layout segments so that
RO segments are preferred over RW.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
9 years agoNFSv4.1/pnfs: Allow pNFS device drivers to customise layout segment insertion
Trond Myklebust [Tue, 25 Aug 2015 12:54:17 +0000 (08:54 -0400)]
NFSv4.1/pnfs: Allow pNFS device drivers to customise layout segment insertion

This is needed in order to allow merging of contiguous layout segments,
and also to correct the ordering of layouts for those device drivers that
don't necessarily want to place the read-write layouts first.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
9 years agoNFSv4.1/pnfs: Add sanity check for the layout range returned by the server
Trond Myklebust [Tue, 25 Aug 2015 15:16:13 +0000 (11:16 -0400)]
NFSv4.1/pnfs: Add sanity check for the layout range returned by the server

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
9 years agoNFSv4.1/pnfs Improve the packing of struct pnfs_layout_hdr
Trond Myklebust [Tue, 25 Aug 2015 12:41:24 +0000 (08:41 -0400)]
NFSv4.1/pnfs Improve the packing of struct pnfs_layout_hdr

Eliminate a couple of holes in the structure, and move the 2 atomics
into the same cacheline.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
9 years agoNFSv4.1/flexfile: ff_layout_remove_mirror can be static
kbuild test robot [Tue, 25 Aug 2015 03:19:25 +0000 (11:19 +0800)]
NFSv4.1/flexfile: ff_layout_remove_mirror can be static

Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
9 years agoNFSv4.2/pnfs: Make the layoutstats timer configurable
Trond Myklebust [Tue, 25 Aug 2015 00:39:18 +0000 (20:39 -0400)]
NFSv4.2/pnfs: Make the layoutstats timer configurable

Allow advanced users to set the layoutstats timer in order to lengthen
or shorten the period between layoutstat transmissions to the server.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
9 years agoNFSv4.1/flexfile: Ensure uniqueness of mirrors across layout segments
Trond Myklebust [Tue, 25 Aug 2015 00:03:17 +0000 (20:03 -0400)]
NFSv4.1/flexfile: Ensure uniqueness of mirrors across layout segments

Keep the full list of mirrors in the struct nfs4_ff_layout_mirror so that
they can be shared among the layout segments that use them.
Also ensure that we send out only one copy of the layoutstats per mirror.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
9 years agoNFSv4.1/flexfiles: Remove mirror backpointer to lseg.
Trond Myklebust [Mon, 24 Aug 2015 22:22:28 +0000 (18:22 -0400)]
NFSv4.1/flexfiles: Remove mirror backpointer to lseg.

When we start sharing mirrors between several lsegs, we won't be able to
keep it.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
9 years agoNFSv4.1/flexfiles: Add refcounting to struct nfs4_ff_layout_mirror
Trond Myklebust [Mon, 24 Aug 2015 22:08:30 +0000 (18:08 -0400)]
NFSv4.1/flexfiles: Add refcounting to struct nfs4_ff_layout_mirror

We do want to share mirrors between layout segments, so add a refcount
to enable that.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
9 years agostaging/lustre/o2iblnd: remove references to ib_reg_phsy_mr()
Oleg Drokin [Wed, 19 Aug 2015 01:04:35 +0000 (21:04 -0400)]
staging/lustre/o2iblnd: remove references to ib_reg_phsy_mr()

Removed references to ib_reg_phsy_mr() and PMR which was added
to deal with some Chelsio specific scenario, but no longer needed
now.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Amir Shehata <amir.shehata@intel.com>
Signed-off-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
9 years agoNFS41/flexfiles: zero out DS write wcc
Peng Tao [Fri, 21 Aug 2015 22:40:00 +0000 (06:40 +0800)]
NFS41/flexfiles: zero out DS write wcc

We do not want to update inode attributes with DS values.

Cc: stable@vger.kernel.org # v4.0+
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
9 years agoNFS41: remove NFS_LAYOUT_ROC flag
Peng Tao [Fri, 21 Aug 2015 04:49:44 +0000 (12:49 +0800)]
NFS41: remove NFS_LAYOUT_ROC flag

If we return delegation before closing, we fail to do roc check
during close because NFS_LAYOUT_ROC is cleared by delegreturn
and it causes layouts to be still hanging around after delegreturn
+ close, which is a voilation against protocol.

Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
9 years agoNFSv4: Add a tracepoint for CB_LAYOUTRECALL
Trond Myklebust [Fri, 21 Aug 2015 01:43:14 +0000 (20:43 -0500)]
NFSv4: Add a tracepoint for CB_LAYOUTRECALL

Only support for single file layoutrecall for now.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
9 years agoNFSv4: Add a tracepoint for CB_GETATTR
Trond Myklebust [Fri, 21 Aug 2015 01:07:54 +0000 (20:07 -0500)]
NFSv4: Add a tracepoint for CB_GETATTR

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
9 years agoNFSv4.1/pnfs: Add a tracepoint for return-on-close events
Trond Myklebust [Thu, 20 Aug 2015 20:40:47 +0000 (15:40 -0500)]
NFSv4.1/pnfs: Add a tracepoint for return-on-close events

Allow tracing of return-on-close.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
9 years agoNFSv4: Force a post-op attribute update when holding a delegation
Trond Myklebust [Thu, 20 Aug 2015 23:56:07 +0000 (18:56 -0500)]
NFSv4: Force a post-op attribute update when holding a delegation

If the ctime or mtime or change attribute have changed because
of an operation we initiated, we should make sure that we force
an attribute update. However we do not want to mark the page cache
for revalidation.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: stable@vger.kernel.org # v4.0+
9 years agoNFSv4.1/pnfs Ensure flexfiles reports all connection related errors
Trond Myklebust [Thu, 20 Aug 2015 22:59:49 +0000 (17:59 -0500)]
NFSv4.1/pnfs Ensure flexfiles reports all connection related errors

Make sure that we also handle RPC level connection and protocol
negotiation errors.

Reported-by: Tom Haynes <loghyr@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
9 years agoNFSv4.1/pnfs: Ensure the flexfiles layoutstats timers are consistent
Trond Myklebust [Thu, 20 Aug 2015 18:12:51 +0000 (13:12 -0500)]
NFSv4.1/pnfs: Ensure the flexfiles layoutstats timers are consistent

We want to ensure that the stopwatches for the busy timer and the
aggregate timer are consistent. This means that they need to use
the same start/stop times.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
9 years agoNFS41: fix list splice type
Peng Tao [Fri, 21 Aug 2015 02:32:50 +0000 (10:32 +0800)]
NFS41: fix list splice type

We want to move commiting pages to pages list instead.
Otherwise it causes pnfs small writes crash like:

[34560.037692] BUG: unable to handle kernel NULL pointer dereference at 0000000000000068
[34560.038557] IP: [<ffffffffa05423d6>] nfs_init_commit+0x26/0x130 [nfs]
[34560.039400] PGD 69f5a067 PUD 69f59067 PMD 0
[34560.040207] Oops: 0000 [#1] SMP
[34560.041014] Modules linked in: nfsv3(OE) nfs_layout_flexfiles(OE) nfsv4(OE) nfs(OE) fscache(E) rpcsec_gss_krb5(E) xt_addrtype(E) xt_conntrack(E) ipt_MASQUERADE(E) nf_nat_masquerade_ipv4(E) iptable_nat(E) nf_conntrack_ipv4(E) nf_defrag_ipv4(E) nf_nat_ipv4(E) iptable_filter(E) ip_tables(E) x_tables(E) nf_nat(E) nf_conntrack(E) bridge(E) stp(E) llc(E) dm_thin_pool(E) dm_persistent_data(E) dm_bio_prison(E) dm_bufio(E) ppdev(E) vmw_balloon(E) coretemp(E) crc32_pclmul(E) ghash_clmulni_intel(E) aesni_intel(E) aes_x86_64(E) glue_helper(E) lrw(E) gf128mul(E) ablk_helper(E) cryptd(E) psmouse(E) serio_raw(E) vmw_vmci(E) i2c_piix4(E) shpchp(E) parport_pc(E) parport(E) mac_hid(E) nfsd(E) auth_rpcgss(E) nfs_acl(E) lockd(E) grace(E) sunrpc(E) xfs(E) libcrc32c(E) hid_generic(E) usbhid(E) hid(E) e1000(E) mptspi(E)
[34560.045106]  mptscsih(E) mptbase(E) vmwgfx(E) drm_kms_helper(E) ttm(E) drm(E) autofs4(E) [last unloaded: fscache]
[34560.045897] CPU: 0 PID: 130543 Comm: bash Tainted: G           OE   4.2.0-rc5-dp-00057-gf993a93 #11
[34560.046699] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/20/2014
[34560.047525] task: ffff880031b0a980 ti: ffff880045fec000 task.ti: ffff880045fec000
[34560.048264] RIP: 0010:[<ffffffffa05423d6>]  [<ffffffffa05423d6>] nfs_init_commit+0x26/0x130 [nfs]
[34560.049000] RSP: 0018:ffff880045fefc18  EFLAGS: 00010246
[34560.049717] RAX: 0000000000000000 RBX: ffff8800208fbc80 RCX: ffff880045fefd50
[34560.050396] RDX: ffff880031c19ec0 RSI: ffff880045fefc88 RDI: ffff8800208fbc80
[34560.051041] RBP: ffff880045fefc28 R08: ffff8800208fbe68 R09: ffff880045fefc88
[34560.051666] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880045fefc78
[34560.052247] R13: ffff880045fefc88 R14: ffff880045fefa90 R15: ffff880045fefd50
[34560.052825] FS:  00007fa02d58c740(0000) GS:ffff88006d600000(0000) knlGS:0000000000000000
[34560.053410] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[34560.053992] CR2: 0000000000000068 CR3: 000000003b37a000 CR4: 00000000001406f0
[34560.054615] Stack:
[34560.055200]  ffff8800208fbc80 ffff8800208fbc80 ffff880045fefcc8 ffffffffa05c1a5b
[34560.055800]  ffff880045fefcc8 ffff880045fefd50 0000000045fefcb8 ffff880045fefd40
[34560.056418]  ffff8800420608e0 ffffffffa04f3910 0000000100000001 ffff880045fefd50
[34560.057013] Call Trace:
[34560.057672]  [<ffffffffa05c1a5b>] pnfs_generic_commit_pagelist+0x1cb/0x300 [nfsv4]
[34560.058277]  [<ffffffffa04f3910>] ? ff_layout_commit_pagelist+0x20/0x20 [nfs_layout_flexfiles]
[34560.058907]  [<ffffffffa04f3905>] ff_layout_commit_pagelist+0x15/0x20 [nfs_layout_flexfiles]
[34560.059557]  [<ffffffffa0543fc1>] nfs_generic_commit_list+0xb1/0xf0 [nfs]
[34560.060214]  [<ffffffffa0543e47>] ? nfs_scan_commit+0x37/0xa0 [nfs]
[34560.060825]  [<ffffffffa0544081>] nfs_commit_inode+0x81/0x150 [nfs]
[34560.061432]  [<ffffffffa05443ae>] nfs_wb_all+0x1ae/0x400 [nfs]
[34560.062035]  [<ffffffffa05380ad>] nfs_getattr+0x33d/0x510 [nfs]
[34560.062630]  [<ffffffff8122499c>] vfs_getattr_nosec+0x2c/0x40
[34560.063223]  [<ffffffff81224a66>] vfs_getattr+0x26/0x30
[34560.063818]  [<ffffffff81224b35>] vfs_fstatat+0x65/0xa0
[34560.064413]  [<ffffffff81224f3f>] SYSC_newstat+0x1f/0x40
[34560.065016]  [<ffffffff8102b176>] ? do_audit_syscall_entry+0x66/0x70
[34560.065626]  [<ffffffff8102c773>] ? syscall_trace_enter_phase1+0x113/0x170
[34560.066245]  [<ffffffff81003017>] ? trace_hardirqs_on_thunk+0x17/0x19
[34560.066868]  [<ffffffff812251ae>] SyS_newstat+0xe/0x10
[34560.067533]  [<ffffffff817a5df2>] entry_SYSCALL_64_fastpath+0x16/0x7a
[34560.068173] Code: 0f 1f 44 00 00 0f 1f 44 00 00 55 4c 8d 87 e8 01 00 00 48 89 e5 53 48 89 fb 48 83 ec 08 4c 8b 0e 49 8b 41 18 4c 39 ce 48 8b 40 40 <4c> 8b 50 68 74 24 48 8b 87 e8 01 00 00 48 8b 7e 08 4d 89 41 08
[34560.069609] RIP  [<ffffffffa05423d6>] nfs_init_commit+0x26/0x130 [nfs]
[34560.070295]  RSP <ffff880045fefc18>
[34560.071008] CR2: 0000000000000068
[34560.073207] ---[ end trace f85f873260977406 ]---

[fixes 27571297a7e(pNFS: Tighten up locking around DS commit buckets)]
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
9 years agoNFSv4: Enable delegated opens even when reboot recovery is pending
Trond Myklebust [Thu, 20 Aug 2015 03:30:00 +0000 (22:30 -0500)]
NFSv4: Enable delegated opens even when reboot recovery is pending

Unlike the previous attempt, this takes into account the fact that
we may be calling it from the recovery thread itself. Detect this
by looking at what kind of open we're doing, and checking the state
of the NFS_DELEGATION_NEED_RECLAIM if it turns out we're doing a
reboot reclaim-type open.

Cc: Olga Kornievskaia <aglo@umich.edu>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
9 years agopNFS: Fix an unused variable warning in pnfs_roc_get_barrier
Trond Myklebust [Thu, 20 Aug 2015 04:00:50 +0000 (23:00 -0500)]
pNFS: Fix an unused variable warning in pnfs_roc_get_barrier

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
9 years agoSUNRPC: Allow sockets to do GFP_NOIO allocations
Trond Myklebust [Thu, 20 Aug 2015 02:46:15 +0000 (21:46 -0500)]
SUNRPC: Allow sockets to do GFP_NOIO allocations

Follow up to commit c4a7ca774949 ("SUNRPC: Allow waiting on memory
allocation"). Allows the RPC socket code to do non-IO blocking.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
9 years agoNFS41/flexfiles: update inode after write finishes
Peng Tao [Wed, 19 Aug 2015 17:52:59 +0000 (01:52 +0800)]
NFS41/flexfiles: update inode after write finishes

Otherwise we break fstest case tests/read_write/mctime.t

Does files layout need the same fix as well?

Cc: stable@vger.kernel.org # v4.0+
Cc: Anna Schumaker <anna.schumaker@netapp.com>
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
9 years agoNFS41: make sure sending LAYOUTRETURN before close if marked so
Peng Tao [Wed, 19 Aug 2015 05:49:19 +0000 (13:49 +0800)]
NFS41: make sure sending LAYOUTRETURN before close if marked so

If layout is marked by NFS_LAYOUT_RETURN_BEFORE_CLOSE, we should always
send LAYOUTRETURN before close, and we don't need to do ROC drain if we
do send LAYOUTRETURN.

Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
9 years agoRevert "NFSv4: Remove incorrect check in can_open_delegated()"
Trond Myklebust [Wed, 19 Aug 2015 05:14:20 +0000 (00:14 -0500)]
Revert "NFSv4: Remove incorrect check in can_open_delegated()"

This reverts commit 4e379d36c050b0117b5d10048be63a44f5036115.

This commit opens up a race between the recovery code and the open code.

Reported-by: Olga Kornievskaia <aglo@umich.edu>
Cc: stable@vger.kernel # v4.0+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
9 years agoNFSv4.1/pnfs: Play safe w.r.t. close() races when return-on-close is set
Trond Myklebust [Wed, 19 Aug 2015 04:45:13 +0000 (23:45 -0500)]
NFSv4.1/pnfs: Play safe w.r.t. close() races when return-on-close is set

If we have an OPEN_DOWNGRADE and CLOSE race with one another, we want
to ensure that the layout is forgotten by the client, so that we
start afresh with a new layoutget.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
9 years agoNFSv4.1/pnfs: Fix a close/delegreturn hang when return-on-close is set
Trond Myklebust [Wed, 19 Aug 2015 04:23:21 +0000 (23:23 -0500)]
NFSv4.1/pnfs: Fix a close/delegreturn hang when return-on-close is set

The helper pnfs_roc() has already verified that we have no delegations,
and no further open files, hence no outstanding I/O and it has marked
all the return-on-close lsegs as being invalid.
Furthermore, it sets the NFS_LAYOUT_RETURN bit, thus serialising the
close/delegreturn with all future layoutget calls on this inode.

The checks in pnfs_roc_drain() for valid layout segments are therefore
redundant: those cannot exist until another layoutget completes.
The other check for whether or not NFS_LAYOUT_RETURN is set, actually
causes a hang, since we already know that we hold that flag.

To fix, we therefore strip out all the functionality in pnfs_roc_drain()
except the retrieval of the barrier state, and then rename the function
accordingly.

Reported-by: Christoph Hellwig <hch@infradead.org>
Fixes: 5c4a79fb2b1c ("Don't prevent layoutgets when doing return-on-close")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
9 years agoNFS: Don't fsync twice for O_SYNC/IS_SYNC files
Trond Myklebust [Mon, 17 Aug 2015 21:55:18 +0000 (16:55 -0500)]
NFS: Don't fsync twice for O_SYNC/IS_SYNC files

generic_file_write_iter() will already do an fsync on our behalf
if the file descriptor is O_SYNC or the file is marked as IS_SYNC.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
9 years agoSUNRPC: Drop double-underscores from __rpc_cmp_addr6()
Trond Myklebust [Mon, 17 Aug 2015 19:46:51 +0000 (14:46 -0500)]
SUNRPC: Drop double-underscores from __rpc_cmp_addr6()

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Fixes: 7b0ce60c0b20 ("SUNRPC: Drop double-underscores from rpc_cmp_addr{4|6}()")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
9 years agoNFS: Don't let the ctime override attribute barriers.
Trond Myklebust [Thu, 6 Aug 2015 16:06:30 +0000 (12:06 -0400)]
NFS: Don't let the ctime override attribute barriers.

Chuck reports seeing cases where a GETATTR that happens to race
with an asynchronous WRITE is overriding the file size, despite
the attribute barrier being set by the writeback code.

The culprit turns out to be the check in nfs_ctime_need_update(),
which sees that the ctime is newer than the cached ctime, and
assumes that it is safe to override the attribute barrier.
This patch removes that override, and ensures that attribute
barriers are always respected.

Reported-by: Chuck Lever <chuck.lever@oracle.com>
Fixes: a08a8cd375db9 ("NFS: Add attribute update barriers to NFS writebacks")
Cc: stable@vger.kernel.org # v4.0+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
9 years agoMerge branch 'layoutfixes'
Trond Myklebust [Mon, 17 Aug 2015 18:36:32 +0000 (13:36 -0500)]
Merge branch 'layoutfixes'

* layoutfixes:
  NFSv4.1/pnfs: Remove redundant wakeup in pnfs_send_layoutreturn()
  NFSv4.1/pnfs: Remove redundant check in pnfs_layoutgets_blocked()
  NFSv4.1/pnfs: Remove redundant lo->plh_block_lgets in layoutreturn
  NFSv4.1/pnfs: Don't prevent layoutgets when doing return-on-close
  NFSv4.1/pnfs: Fix serialisation of layout return and layoutget
  NFSv4.1/pnfs: Remove redundant checks in pnfs_layoutgets_blocked()
  pNFS: Tighten up locking around DS commit buckets

9 years agoMerge branch 'bugfixes'
Trond Myklebust [Mon, 17 Aug 2015 18:36:22 +0000 (13:36 -0500)]
Merge branch 'bugfixes'

* bugfixes:
  SUNRPC: Fix a thinko in xs_connect()
  NFSv4.1/pNFS: Fix borken function _same_data_server_addrs_locked()
  NFS: nfs_set_pgio_error sometimes misses errors

9 years agoMerge tag 'nfs-rdma-for-4.3' of git://git.linux-nfs.org/projects/anna/nfs-rdma
Trond Myklebust [Mon, 17 Aug 2015 18:33:58 +0000 (13:33 -0500)]
Merge tag 'nfs-rdma-for-4.3' of git://git.linux-nfs.org/projects/anna/nfs-rdma

NFS: NFS over RDMA Client Side Changes

These patches improve both client performance and scalability, most notably
by increasing the maixmum allowed rsize and wsize and by increasing the number
of RDMA "credits".  There are also several bugfixes, such as correcting how
WRITE compounds are encoded and fixing large NFS symlink operations.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
9 years agoNFS: Remove nfs_release()
Anna Schumaker [Mon, 13 Jul 2015 18:01:33 +0000 (14:01 -0400)]
NFS: Remove nfs_release()

And call nfs_file_clear_open_context() directly.  This makes it obvious
that nfs_file_release() will always return 0.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
9 years agoNFS: Rename nfs_commit_unstable_pages() to nfs_write_inode()
Anna Schumaker [Mon, 13 Jul 2015 18:01:32 +0000 (14:01 -0400)]
NFS: Rename nfs_commit_unstable_pages() to nfs_write_inode()

All nfs_write_inode() does is pass its arguments to
nfs_commit_unstable_pages().  Let's cut out the middle man and have
nfs_write_pages() do the work directly.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
9 years agoNFS: Remove nfs41_server_notify_{target|highest}_slotid_update()
Anna Schumaker [Mon, 13 Jul 2015 18:01:31 +0000 (14:01 -0400)]
NFS: Remove nfs41_server_notify_{target|highest}_slotid_update()

All these functions do is call nfs41_ping_server() without adding
anything.  Let's remove them and give nfs41_ping_server() a better name
instead.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
9 years agoNFS: Combine nfs_idmap_{init|quit}() and nfs_idmap_{init|quit}_keyring()
Anna Schumaker [Mon, 13 Jul 2015 18:01:29 +0000 (14:01 -0400)]
NFS: Combine nfs_idmap_{init|quit}() and nfs_idmap_{init|quit}_keyring()

The idmap_init() and idmap_quit() functions only exist to call the
_keyring() version.  Let's just call the keyring() functions directly.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
9 years agoNFS: Use RPC functions for matching sockaddrs
Anna Schumaker [Mon, 13 Jul 2015 18:01:28 +0000 (14:01 -0400)]
NFS: Use RPC functions for matching sockaddrs

They already exist and do the exact same thing.  Let's save ourselves
several lines of code!

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
9 years agoSUNRPC: Add an rpc_cmp_addr_port() function
Anna Schumaker [Mon, 13 Jul 2015 18:01:27 +0000 (14:01 -0400)]
SUNRPC: Add an rpc_cmp_addr_port() function

This function is to help determine if two sockaddrs are really the same
socket.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
9 years agoSUNRPC: Drop double-underscores from rpc_cmp_addr{4|6}()
Anna Schumaker [Mon, 13 Jul 2015 18:01:26 +0000 (14:01 -0400)]
SUNRPC: Drop double-underscores from rpc_cmp_addr{4|6}()

I'm planning on using these functions inside the client, so remove the
underscores to make it feel like I'm using a public interface.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
9 years agoNFS: Rename nfs_readdir_free_pagearray() and nfs_readdir_large_page()
Anna Schumaker [Mon, 13 Jul 2015 18:01:25 +0000 (14:01 -0400)]
NFS: Rename nfs_readdir_free_pagearray() and nfs_readdir_large_page()

nfs_readdir_xdr_to_array() uses both a cache array and an array of
pages, so I rename these functions to make it clearer how the code
works.  nfs_readdir_large_page() becomes nfs_readdir_alloc_pages()
because this function has absolutely nothing to do with setting up a
large page.  nfs_readdir_free_pagearray() becomes
nfs_readdir_free_pages() to stay consistent with the new alloc_pages()
function.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
9 years agoNFS: Remove unused variable "pages_ptr"
Anna Schumaker [Mon, 13 Jul 2015 18:01:24 +0000 (14:01 -0400)]
NFS: Remove unused variable "pages_ptr"

This variable is initialized to NULL and is never modified before being
passed to nfs_readdir_free_large_page().  But that's okay, because
nfs_readdir_free_large_page() only seems to exist as a way of calling
nfs_readdir_free_pagearray() without this parameter.  Let's simplify by
removing pages_ptr and nfs_readdir_free_pagearray().

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
9 years agonfs: remove some dead code in ff_layout_pg_get_mirror_count_write
Jeff Layton [Fri, 10 Jul 2015 19:59:26 +0000 (15:59 -0400)]
nfs: remove some dead code in ff_layout_pg_get_mirror_count_write

We already know that pg_lseg is NULL here.

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
9 years agopnfs: move common blocklayout XDR defintions to nfs4.h
Christoph Hellwig [Mon, 17 Aug 2015 16:41:01 +0000 (18:41 +0200)]
pnfs: move common blocklayout XDR defintions to nfs4.h

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
9 years agopnfs/blocklayout: pass proper file mode to blkdev_get/put
Christoph Hellwig [Mon, 17 Aug 2015 16:41:00 +0000 (18:41 +0200)]
pnfs/blocklayout: pass proper file mode to blkdev_get/put

We generally want to read and write to a block device that's used by
the pNFS block layout client (and even if it's read only the server
has no way of telling us).  Add FMODE_WRITE to the mode argument
so that we don't incorrectly tell the block driver that we want a
read-only open.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
9 years agopnfs/blocklayout: reject too long signatures
Christoph Hellwig [Mon, 17 Aug 2015 16:40:59 +0000 (18:40 +0200)]
pnfs/blocklayout: reject too long signatures

Instead of overwriting kernel memory reject too long signatures.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
9 years agopnfs/blocklayout: set up layoutupdate_pages properly
Christoph Hellwig [Mon, 17 Aug 2015 16:40:58 +0000 (18:40 +0200)]
pnfs/blocklayout: set up layoutupdate_pages properly

We need to replace the __be32 with a void pointer to do proper arithmentics
on the virtual addresses so that we can get the right page pointers.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
9 years agopnfs/blocklayout: calculate layoutupdate size correctly
Christoph Hellwig [Mon, 17 Aug 2015 16:40:57 +0000 (18:40 +0200)]
pnfs/blocklayout: calculate layoutupdate size correctly

We need to include the first u32 for the number of entries.  Add a helper
for the calculation instead of opencoding it so that it's in one place.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
9 years agoNFS: Fix a NULL pointer dereference of migration recovery ops for v4.2 client
Kinglong Mee [Sat, 15 Aug 2015 13:52:10 +0000 (21:52 +0800)]
NFS: Fix a NULL pointer dereference of migration recovery ops for v4.2 client

---Steps to Reproduce--
<nfs-server>
# cat /etc/exports
/nfs/referal  *(rw,insecure,no_subtree_check,no_root_squash,crossmnt)
/nfs/old      *(ro,insecure,subtree_check,root_squash,crossmnt)

<nfs-client>
# mount -t nfs nfs-server:/nfs/ /mnt/
# ll /mnt/*/

<nfs-server>
# cat /etc/exports
/nfs/referal   *(rw,insecure,no_subtree_check,no_root_squash,crossmnt,refer=/nfs/old/@nfs-server)
/nfs/old       *(ro,insecure,subtree_check,root_squash,crossmnt)
# service nfs restart

<nfs-client>
# ll /mnt/*/    --->>>>> oops here

[ 5123.102925] BUG: unable to handle kernel NULL pointer dereference at           (null)
[ 5123.103363] IP: [<ffffffffa03ed38b>] nfs4_proc_get_locations+0x9b/0x120 [nfsv4]
[ 5123.103752] PGD 587b9067 PUD 3cbf5067 PMD 0
[ 5123.104131] Oops: 0000 [#1]
[ 5123.104529] Modules linked in: nfsv4(OE) nfs(OE) fscache(E) nfsd(OE) xfs libcrc32c iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi coretemp crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel ppdev vmw_balloon parport_pc parport i2c_piix4 shpchp auth_rpcgss nfs_acl vmw_vmci lockd grace sunrpc vmwgfx drm_kms_helper ttm drm mptspi serio_raw scsi_transport_spi e1000 mptscsih mptbase ata_generic pata_acpi [last unloaded: nfsd]
[ 5123.105887] CPU: 0 PID: 15853 Comm: ::1-manager Tainted: G           OE   4.2.0-rc6+ #214
[ 5123.106358] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/20/2014
[ 5123.106860] task: ffff88007620f300 ti: ffff88005877c000 task.ti: ffff88005877c000
[ 5123.107363] RIP: 0010:[<ffffffffa03ed38b>]  [<ffffffffa03ed38b>] nfs4_proc_get_locations+0x9b/0x120 [nfsv4]
[ 5123.107909] RSP: 0018:ffff88005877fdb8  EFLAGS: 00010246
[ 5123.108435] RAX: ffff880053f3bc00 RBX: ffff88006ce6c908 RCX: ffff880053a0d240
[ 5123.108968] RDX: ffffea0000e6d940 RSI: ffff8800399a0000 RDI: ffff88006ce6c908
[ 5123.109503] RBP: ffff88005877fe28 R08: ffffffff81c708a0 R09: 0000000000000000
[ 5123.110045] R10: 00000000000001a2 R11: ffff88003ba7f5c8 R12: ffff880054c55800
[ 5123.110618] R13: 0000000000000000 R14: ffff880053a0d240 R15: ffff880053a0d240
[ 5123.111169] FS:  0000000000000000(0000) GS:ffffffff81c27000(0000) knlGS:0000000000000000
[ 5123.111726] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 5123.112286] CR2: 0000000000000000 CR3: 0000000054cac000 CR4: 00000000001406f0
[ 5123.112888] Stack:
[ 5123.113458]  ffffea0000e6d940 ffff8800399a0000 00000000000167d0 0000000000000000
[ 5123.114049]  0000000000000000 0000000000000000 0000000000000000 00000000a7ec82c6
[ 5123.114662]  ffff88005877fe18 ffffea0000e6d940 ffff8800399a0000 ffff880054c55800
[ 5123.115264] Call Trace:
[ 5123.115868]  [<ffffffffa03fb44b>] nfs4_try_migration+0xbb/0x220 [nfsv4]
[ 5123.116487]  [<ffffffffa03fcb3b>] nfs4_run_state_manager+0x4ab/0x7b0 [nfsv4]
[ 5123.117104]  [<ffffffffa03fc690>] ? nfs4_do_reclaim+0x510/0x510 [nfsv4]
[ 5123.117813]  [<ffffffff810a4527>] kthread+0xd7/0xf0
[ 5123.118456]  [<ffffffff810a4450>] ? kthread_worker_fn+0x160/0x160
[ 5123.119108]  [<ffffffff816d9cdf>] ret_from_fork+0x3f/0x70
[ 5123.119723]  [<ffffffff810a4450>] ? kthread_worker_fn+0x160/0x160
[ 5123.120329] Code: 4c 8b 6a 58 74 17 eb 52 48 8d 55 a8 89 c6 4c 89 e7 e8 4a b5 ff ff 8b 45 b0 85 c0 74 1c 4c 89 f9 48 8b 55 90 48 8b 75 98 48 89 df <41> ff 55 00 3d e8 d8 ff ff 41 89 c6 74 cf 48 8b 4d c8 65 48 33
[ 5123.121643] RIP  [<ffffffffa03ed38b>] nfs4_proc_get_locations+0x9b/0x120 [nfsv4]
[ 5123.122308]  RSP <ffff88005877fdb8>
[ 5123.122942] CR2: 0000000000000000

Fixes: ec011fe847 ("NFS: Introduce a vector of migration recovery ops")
Cc: stable@vger.kernel.org # v3.13+
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
9 years agoSUNRPC: Fix a thinko in xs_connect()
Trond Myklebust [Thu, 13 Aug 2015 19:33:51 +0000 (15:33 -0400)]
SUNRPC: Fix a thinko in xs_connect()

It is rather pointless to test the value of transport->inet after
calling xs_reset_transport(), since it will always be zero, and
so we will never see any exponential back off behaviour.
Also don't force early connections for SOFTCONN tasks. If the server
disconnects us, we should respect the exponential backoff.

Cc: stable@vger.kernel.org # 4.0+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
9 years agoNFSv4.1/pNFS: Fix borken function _same_data_server_addrs_locked()
Trond Myklebust [Thu, 13 Aug 2015 14:59:07 +0000 (10:59 -0400)]
NFSv4.1/pNFS: Fix borken function _same_data_server_addrs_locked()

- Switch back to using list_for_each_entry(). Fixes an incorrect test
  for list NULL termination.
- Do not assume that lists are sorted.
- Finally, consider an existing entry to match if it consists of a subset
  of the addresses in the new entry.

Cc: stable@vger.kernel.org # 4.0+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
9 years agoNFS: nfs_set_pgio_error sometimes misses errors
Trond Myklebust [Mon, 17 Aug 2015 17:57:07 +0000 (12:57 -0500)]
NFS: nfs_set_pgio_error sometimes misses errors

We should ensure that we always set the pgio_header's error field
if a READ or WRITE RPC call returns an error. The current code depends
on 'hdr->good_bytes' always being initialised to a large value, which
is not always done correctly by callers.
When this happens, applications may end up missing important errors.

Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
9 years agoNFSv4.1/pnfs: Remove redundant wakeup in pnfs_send_layoutreturn()
Trond Myklebust [Tue, 4 Aug 2015 21:18:10 +0000 (17:18 -0400)]
NFSv4.1/pnfs: Remove redundant wakeup in pnfs_send_layoutreturn()

pnfs_clear_layoutreturn_waitbit() should already be calling
rpc_wake_up(&NFS_SERVER(ino)->roc_rpcwaitq) for us.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
9 years agoNFSv4.1/pnfs: Remove redundant check in pnfs_layoutgets_blocked()
Trond Myklebust [Tue, 4 Aug 2015 20:40:08 +0000 (16:40 -0400)]
NFSv4.1/pnfs: Remove redundant check in pnfs_layoutgets_blocked()

layoutget now should already be serialised w.r.t. layout returns

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
9 years agoNFSv4.1/pnfs: Remove redundant lo->plh_block_lgets in layoutreturn
Trond Myklebust [Tue, 4 Aug 2015 20:15:48 +0000 (16:15 -0400)]
NFSv4.1/pnfs: Remove redundant lo->plh_block_lgets in layoutreturn

The NFS_LAYOUT_RETURN bit already suffices to ensure that layoutget
is blocked.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
9 years agoNFSv4.1/pnfs: Don't prevent layoutgets when doing return-on-close
Trond Myklebust [Tue, 4 Aug 2015 19:57:13 +0000 (15:57 -0400)]
NFSv4.1/pnfs: Don't prevent layoutgets when doing return-on-close

If there is an outstanding return-on-close, then we just want new
layoutget requests to wait rather than fail.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
9 years agoNFSv4.1/pnfs: Fix serialisation of layout return and layoutget
Trond Myklebust [Tue, 4 Aug 2015 20:09:44 +0000 (16:09 -0400)]
NFSv4.1/pnfs: Fix serialisation of layout return and layoutget

We should always test for outstanding layout returns, whether or not
pnfs_should_retry_layoutget() is true.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
9 years agoNFSv4.1/pnfs: Remove redundant checks in pnfs_layoutgets_blocked()
Trond Myklebust [Tue, 4 Aug 2015 19:41:50 +0000 (15:41 -0400)]
NFSv4.1/pnfs: Remove redundant checks in pnfs_layoutgets_blocked()

If there are no valid layout segments, then we should already have
checked in pnfs_update_layout() whether or not this is the first
layoutget.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
9 years agopNFS: Tighten up locking around DS commit buckets
Trond Myklebust [Mon, 3 Aug 2015 21:38:33 +0000 (17:38 -0400)]
pNFS: Tighten up locking around DS commit buckets

I'm not aware of any bugreports around this issue, but the locking
around the pnfs_commit_bucket is inconsistent at best. This patch
tightens it up by ensuring that the 'bucket->committing' list is always
changed atomically w.r.t. the 'bucket->clseg' layout segment tracking.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
9 years agoNFS: Remove duplicate svc_xprt_put from nfs41_callback_up
Kinglong Mee [Thu, 30 Jul 2015 13:40:06 +0000 (21:40 +0800)]
NFS: Remove duplicate svc_xprt_put from nfs41_callback_up

The xprt created by svc_create_xprt have be added to serv->sv_permsocks.
So putting the xprt directly is useless.
Otherwise, there is a more svc_xprt_put after the xprt be freed.

v2, same as v1.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
9 years agoNFSv4: don't set SETATTR for O_RDONLY|O_EXCL
NeilBrown [Thu, 30 Jul 2015 03:00:56 +0000 (13:00 +1000)]
NFSv4: don't set SETATTR for O_RDONLY|O_EXCL

It is unusual to combine the open flags O_RDONLY and O_EXCL, but
it appears that libre-office does just that.

[pid  3250] stat("/home/USER/.config", {st_mode=S_IFDIR|0700, st_size=8192, ...}) = 0
[pid  3250] open("/home/USER/.config/libreoffice/4-suse/user/extensions/buildid", O_RDONLY|O_EXCL <unfinished ...>

NFSv4 takes O_EXCL as a sign that a setattr command should be sent,
probably to reset the timestamps.

When it was an O_RDONLY open, the SETATTR command does not
identify any actual attributes to change.
If no delegation was provided to the open, the SETATTR uses the
all-zeros stateid and the request is accepted (at least by the
Linux NFS server - no harm, no foul).

If a read-delegation was provided, this is used in the SETATTR
request, and a Netapp filer will justifiably claim
NFS4ERR_BAD_STATEID, which the Linux client takes as a sign
to retry - indefinitely.

So only treat O_EXCL specially if O_CREAT was also given.

Signed-off-by: NeilBrown <neilb@suse.com>
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
9 years agoNFS: Error out when register_shrinker fail in register_nfs_fs
Kinglong Mee [Thu, 30 Jul 2015 13:41:08 +0000 (21:41 +0800)]
NFS: Error out when register_shrinker fail in register_nfs_fs

Commit 1d3d4437ea "vmscan: per-node deferred work" have made
register_shrinker can return an intergater error.

If register_shrinker() fail, the later unregister_shrinker() will
 cause a NULL pointer access.

v2, same as v1.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
9 years agosunrpc: increase UNX_MAXNODENAME from 32 to __NEW_UTS_LEN bytes
Jeff Layton [Mon, 3 Aug 2015 11:44:53 +0000 (07:44 -0400)]
sunrpc: increase UNX_MAXNODENAME from 32 to __NEW_UTS_LEN bytes

The current limit of 32 bytes artificially limits the name string that
we end up stuffing into NFSv4.x client ID blobs. If you have multiple
hosts with long hostnames that only differ near the end, then this can
cause NFSv4 client ID collisions.

Linux nodenames are actually limited to __NEW_UTS_LEN bytes (64), so use
that as the limit instead. Also, use XDR_QUADLEN to specify the slack
length, just for clarity and in case someone in the future changes this
to something not evenly divisible by 4.

Reported-by: Michael Skralivetsky <michael.skralivetsky@primarydata.com>
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
9 years agoNFSv4.2/pnfs: Use GFP_NOIO for layoutstat reporting in the writeback path
Trond Myklebust [Wed, 5 Aug 2015 21:31:58 +0000 (17:31 -0400)]
NFSv4.2/pnfs: Use GFP_NOIO for layoutstat reporting in the writeback path

Prevent a potential deadlock.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
9 years agopnfs/flexfiles: LAYOUTSTATS ii_count should be ops instead of bytes
Peng Tao [Mon, 10 Aug 2015 08:47:32 +0000 (16:47 +0800)]
pnfs/flexfiles: LAYOUTSTATS ii_count should be ops instead of bytes

Turned out I misinterpreted the spec...

Cc: Tom Haynes <thomas.haynes@primarydata.com>
Reported-by: Jean Spector <jean@primarydata.com>
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
9 years agoNFSv4.1/pnfs: Fix atomicity of commit list updates
Trond Myklebust [Fri, 31 Jul 2015 20:24:30 +0000 (16:24 -0400)]
NFSv4.1/pnfs: Fix atomicity of commit list updates

pnfs_layout_mark_request_commit() needs to ensure that it adds the
request to the commit list atomically with all the other updates
in order to prevent corruption to buckets[ds_commit_idx].wlseg
due to races with pnfs_generic_clear_request_commit().

Fixes: 338d00cfef07d ("pnfs: Refactor the *_layout_mark_request_commit...")
Cc: stable@vger.kernel.org # v4.0+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
9 years agoxprtrdma: take HCA driver refcount at client
Devesh Sharma [Mon, 3 Aug 2015 17:05:04 +0000 (13:05 -0400)]
xprtrdma: take HCA driver refcount at client

This is a rework of the following patch sent almost a year back:
http://www.mail-archive.com/linux-rdma%40vger.kernel.org/msg20730.html

In presence of active mount if someone tries to rmmod vendor-driver, the
command remains stuck forever waiting for destruction of all rdma-cm-id.
in worst case client can crash during shutdown with active mounts.

The existing code assumes that ia->ri_id->device cannot change during
the lifetime of a transport. xprtrdma do not have support for
DEVICE_REMOVAL event either. Lifting that assumption and adding support
for DEVICE_REMOVAL event is a long chain of work, and is in plan.

The community decided that preventing the hang right now is more
important than waiting for architectural changes.

Thus, this patch introduces a temporary workaround to acquire HCA driver
module reference count during the mount of a nfs-rdma mount point.

Signed-off-by: Devesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Sagi Grimberg <sagig@dev.mellanox.co.il>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
9 years agocore: Remove the ib_reg_phys_mr() and ib_rereg_phys_mr() verbs
Chuck Lever [Mon, 3 Aug 2015 17:04:54 +0000 (13:04 -0400)]
core: Remove the ib_reg_phys_mr() and ib_rereg_phys_mr() verbs

The verbs are obsolete. The ib_rereg_phys_mr() verb is not used by
kernel ULPs, and the last ib_reg_phys_mr() call site in the kernel
tree has now been removed.

Two staging tree call sites remain in the Lustre client. The Lustre
team has been notified of the deprecation of reg_phys_mr.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Acked-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
9 years agoxprtrdma: Count RDMA_NOMSG type calls
Chuck Lever [Mon, 3 Aug 2015 17:04:45 +0000 (13:04 -0400)]
xprtrdma: Count RDMA_NOMSG type calls

RDMA_NOMSG type calls are less efficient than RDMA_MSG. Count NOMSG
calls so administrators can tell if they happen to be used more than
expected.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Devesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
9 years agoxprtrdma: Clean up xprt_rdma_print_stats()
Chuck Lever [Mon, 3 Aug 2015 17:04:36 +0000 (13:04 -0400)]
xprtrdma: Clean up xprt_rdma_print_stats()

checkpatch.pl complained about the seq_printf() format string split
across lines and the use of %Lu.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Devesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
9 years agoxprtrdma: Fix large NFS SYMLINK calls
Chuck Lever [Mon, 3 Aug 2015 17:04:26 +0000 (13:04 -0400)]
xprtrdma: Fix large NFS SYMLINK calls

Repair how rpcrdma_marshal_req() chooses which RDMA message type
to use for large non-WRITE operations so that it picks RDMA_NOMSG
in the correct situations, and sets up the marshaling logic to
SEND only the RPC/RDMA header.

Large NFSv2 SYMLINK requests now use RDMA_NOMSG calls. The Linux NFS
server XDR decoder for NFSv2 SYMLINK does not handle having the
pathname argument arrive in a separate buffer. The decoder could be
fixed, but this is simpler and RDMA_NOMSG can be used in a variety
of other situations.

Ensure that the Linux client continues to use "RDMA_MSG + read
list" when sending large NFSv3 SYMLINK requests, which is more
efficient than using RDMA_NOMSG.

Large NFSv4 CREATE(NF4LNK) requests are changed to use "RDMA_MSG +
read list" just like NFSv3 (see Section 5 of RFC 5667). Before,
these did not work at all.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Devesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
9 years agoxprtrdma: Fix XDR tail buffer marshalling
Chuck Lever [Mon, 3 Aug 2015 17:04:17 +0000 (13:04 -0400)]
xprtrdma: Fix XDR tail buffer marshalling

Currently xprtrdma appends an extra chunk element to the RPC/RDMA
read chunk list of each NFSv4 WRITE compound. The extra element
contains the final GETATTR operation in the compound.

The result is an extra RDMA READ operation to transfer a very short
piece of each NFS WRITE compound (typically 16 bytes). This is
inefficient.

It is also incorrect.

The client is sending the trailing GETATTR at the same Position as
the preceding WRITE data payload. Whether or not RFC 5667 allows
the GETATTR to appear in a read chunk, RFC 5666 requires that these
two separate RPC arguments appear at two distinct Positions.

It can also be argued that the GETATTR operation is not bulk data,
and therefore RFC 5667 forbids its appearance in a read chunk at
all.

Although RFC 5667 is not precise about when using a read list with
NFSv4 COMPOUND is allowed, the intent is that only data arguments
not touched by NFS (ie, read and write payloads) are to be sent
using RDMA READ or WRITE.

The NFS client constructs GETATTR arguments itself, and therefore is
required to send the trailing GETATTR operation as additional inline
content, not as a data payload.

NB: This change is not backwards compatible. Some older servers do
not accept inline content following the read list. The Linux NFS
server should handle this content correctly as of commit
a97c331f9aa9 ("svcrdma: Handle additional inline content").

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Devesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
9 years agoxprtrdma: Don't provide a reply chunk when expecting a short reply
Chuck Lever [Mon, 3 Aug 2015 17:04:08 +0000 (13:04 -0400)]
xprtrdma: Don't provide a reply chunk when expecting a short reply

Currently Linux always offers a reply chunk, even when the reply
can be sent inline (ie. is smaller than 1KB).

On the client, registering a memory region can be expensive. A
server may choose not to use the reply chunk, wasting the cost of
the registration.

This is a change only for RPC replies smaller than 1KB which the
server constructs in the RPC reply send buffer. Because the elements
of the reply must be XDR encoded, a copy-free data transfer has no
benefit in this case.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Tested-by: Devesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
9 years agoxprtrdma: Always provide a write list when sending NFS READ
Chuck Lever [Mon, 3 Aug 2015 17:03:58 +0000 (13:03 -0400)]
xprtrdma: Always provide a write list when sending NFS READ

The client has been setting up a reply chunk for NFS READs that are
smaller than the inline threshold. This is not efficient: both the
server and client CPUs have to copy the reply's data payload into
and out of the memory region that is then transferred via RDMA.

Using the write list, the data payload is moved by the device and no
extra data copying is necessary.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Devesh Sharma <devesh.sharma@avagotech.com>
Reviewed-By: Sagi Grimberg <sagig@mellanox.com>
Tested-by: Devesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
9 years agoxprtrdma: Account for RPC/RDMA header size when deciding to inline
Chuck Lever [Mon, 3 Aug 2015 17:03:49 +0000 (13:03 -0400)]
xprtrdma: Account for RPC/RDMA header size when deciding to inline

When the size of the RPC message is near the inline threshold (1KB),
the client would allow messages to be sent that were a few bytes too
large.

When marshaling RPC/RDMA requests, ensure the combined size of
RPC/RDMA header and RPC header do not exceed the inline threshold.
Endpoints typically reject RPC/RDMA messages that exceed the size
of their receive buffers.

The two server implementations I test with (Linux and Solaris) use
receive buffers that are larger than the client’s inline threshold.
Thus so far this has been benign, observed only by code inspection.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Devesh Sharma <devesh.sharma@avagotech.com>
Tested-by: Devesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
9 years agoxprtrdma: Remove logic that constructs RDMA_MSGP type calls
Chuck Lever [Mon, 3 Aug 2015 17:03:39 +0000 (13:03 -0400)]
xprtrdma: Remove logic that constructs RDMA_MSGP type calls

RDMA_MSGP type calls insert a zero pad in the middle of the RPC
message to align the RPC request's data payload to the server's
alignment preferences. A server can then "page flip" the payload
into place to avoid a data copy in certain circumstances. However:

1. The client has to have a priori knowledge of the server's
   preferred alignment

2. Requests eligible for RDMA_MSGP are requests that are small
   enough to have been sent inline, and convey a data payload
   at the _end_ of the RPC message

Today 1. is done with a sysctl, and is a global setting that is
copied during mount. Linux does not support CCP to query the
server's preferences (RFC 5666, Section 6).

A small-ish NFSv3 WRITE might use RDMA_MSGP, but no NFSv4
compound fits bullet 2.

Thus the Linux client currently leaves RDMA_MSGP disabled. The
Linux server handles RDMA_MSGP, but does not use any special
page flipping, so it confers no benefit.

Clean up the marshaling code by removing the logic that constructs
RDMA_MSGP type calls. This also reduces the maximum send iovec size
from four to just two elements.

/proc/sys/sunrpc/rdma_inline_write_padding is a kernel API, and
thus is left in place.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Devesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
9 years agoxprtrdma: Clean up rpcrdma_ia_open()
Chuck Lever [Mon, 3 Aug 2015 17:03:30 +0000 (13:03 -0400)]
xprtrdma: Clean up rpcrdma_ia_open()

Untangle the end of rpcrdma_ia_open() by moving DMA MR set-up, which
is different for each registration method, to the .ro_open functions.

This is refactoring only. No behavior change is expected.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Devesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
9 years agoxprtrdma: Remove last ib_reg_phys_mr() call site
Chuck Lever [Mon, 3 Aug 2015 17:03:20 +0000 (13:03 -0400)]
xprtrdma: Remove last ib_reg_phys_mr() call site

All HCA providers have an ib_get_dma_mr() verb. Thus
rpcrdma_ia_open() will either grab the device's local_dma_key if one
is available, or it will call ib_get_dma_mr(). If ib_get_dma_mr()
fails, rpcrdma_ia_open() fails and no transport is created.

Therefore execution never reaches the ib_reg_phys_mr() call site in
rpcrdma_register_internal(), so it can be removed.

The remaining logic in rpcrdma_{de}register_internal() is folded
into rpcrdma_{alloc,free}_regbuf().

This is clean up only. No behavior change is expected.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Devesh Sharma <devesh.sharma@avagotech.com>
Reviewed-By: Sagi Grimberg <sagig@mellanox.com>
Tested-by: Devesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
9 years agoxprtrdma: Don't fall back to PHYSICAL memory registration
Chuck Lever [Mon, 3 Aug 2015 17:03:09 +0000 (13:03 -0400)]
xprtrdma: Don't fall back to PHYSICAL memory registration

PHYSICAL memory registration uses a single rkey for all of the
client's memory, thus is insecure. It is still useful in some cases
for testing.

Retain the ability to select PHYSICAL memory registration capability
via /proc/sys/sunrpc/rdma_memreg_strategy, but don't fall back to it
if the HCA does not support FRWR or FMR.

This means amso1100 no longer works out of the box with NFS/RDMA.
When using amso1100 HCAs, set the memreg_strategy sysctl to 6 before
performing NFS/RDMA mounts.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Devesh Sharma <devesh.sharma@avagotech.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
9 years agoxprtrdma: Increase default credit limit
Chuck Lever [Mon, 3 Aug 2015 17:02:59 +0000 (13:02 -0400)]
xprtrdma: Increase default credit limit

In preparation for similar increases on NFS/RDMA servers, bump the
advertised credit limit for RPC/RDMA to 128. This allocates some
extra resources, but the client will continue to allow only the
number of RPCs in flight that the server requests via its advertised
credit limit.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-By: Sagi Grimberg <sagig@mellanox.com>
Tested-by: Devesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
9 years agoxprtrdma: Raise maximum payload size to one megabyte
Chuck Lever [Mon, 3 Aug 2015 17:02:50 +0000 (13:02 -0400)]
xprtrdma: Raise maximum payload size to one megabyte

The point of larger rsize and wsize is to reduce the per-byte cost
of memory registration and deregistration. Modern HCAs can typically
handle a megabyte or more with a single registration operation.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Devesh Sharma <devesh.sharma@avagotech.com>
Reviewed-By: Sagi Grimberg <sagig@mellanox.com>
Tested-by: Devesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
9 years agoxprtrdma: Make xprt_setup_rdma() agnostic to family of server address
Chuck Lever [Mon, 3 Aug 2015 17:02:41 +0000 (13:02 -0400)]
xprtrdma: Make xprt_setup_rdma() agnostic to family of server address

In particular, recognize when an IPv6 connection is bound.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Devesh Sharma <devesh.sharma@avagotech.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
9 years agoLinux 4.2-rc5
Linus Torvalds [Mon, 3 Aug 2015 01:34:55 +0000 (18:34 -0700)]
Linux 4.2-rc5

9 years agoMerge tag 'powerpc-4.2-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc...
Linus Torvalds [Mon, 3 Aug 2015 01:07:36 +0000 (18:07 -0700)]
Merge tag 'powerpc-4.2-3' of git://git./linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:
 - TCE table memory calculation fix from Alexey
 - Build fix for ans-lcd from Luis
 - Unbalanced IRQ warning fix from Alistair

* tag 'powerpc-4.2-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/eeh-powernv: Fix unbalanced IRQ warning
  macintosh/ans-lcd: fix build failure after module_init/exit relocation
  powerpc/powernv/ioda2: Fix calculation for memory allocated for TCE table

9 years agoi915: temporary fix for DP MST docking station NULL pointer dereference
Linus Torvalds [Thu, 30 Jul 2015 05:18:16 +0000 (22:18 -0700)]
i915: temporary fix for DP MST docking station NULL pointer dereference

Ted Ts'o reports that his Lenovo T540p ThinkPad crashes at boot if
attached to the docking station.  This is a regression that he was able
to bisect to commit 8c7b5ccb7298: "drm/i915: Use atomic helpers for
computing changed flags:"

The reason seems to be the new call to drm_atomic_helper_check_modeset()
added to intel_modeset_compute_config(), which in turn calls
update_connector_routing(), and somehow ends up picking a NULL crtc for
the connector state, causing the subsequent drm_crtc_index() to OOPS.

Daniel Vetter says that the fundamental issue seems to be confusion in
the encoder selection, and this isn't the right fix, but while he chases
down the proper fix, this at least avoids the NULL pointer dereference
and makes Ted's docking station work again.

Reported-bisected-and-tested-by: Theodore Ts'o <tytso@mit.edu>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: Mani Nikula <jani.nikula@linux.intel.com>
Cc: Dave Airlie <airlied@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9 years agoMerge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Linus Torvalds [Sun, 2 Aug 2015 16:36:21 +0000 (09:36 -0700)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
 "A set of three fixes for the ipr driver and one fairly major one for
  memory leaks in the mq path of SCSI"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: fix memory leak with scsi-mq
  ipr: Fix invalid array indexing for HRRQ
  ipr: Fix incorrect trace indexing
  ipr: Fix locking for unit attention handling

9 years agoMerge tag 'armsoc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm...
Linus Torvalds [Sun, 2 Aug 2015 16:12:46 +0000 (09:12 -0700)]
Merge tag 'armsoc-for-linus' of git://git./linux/kernel/git/arm/arm-soc

Pull ARM SoC fixes from Olof Johansson:
 "Things are calming down nicely here w.r.t. fixes.  This batch
  includes two week's worth since I missed to send before -rc4.

  Nothing particularly scary to point out, smaller fixes here and there.
  Shortlog describes it pretty well"

* tag 'armsoc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
  ARM: dts: keystone: fix dt bindings to use post div register for mainpll
  ARM: nomadik: disable UART0 on Nomadik boards
  ARM: dts: i.MX35: Fix can support.
  ARM: OMAP2+: hwmod: Fix _wait_target_ready() for hwmods without sysc
  ARM: dts: add CPU OPP and regulator supply property for exynos4210
  ARM: dts: Update video-phy node with syscon phandle for exynos3250
  ARM: DRA7: hwmod: fix gpmc hwmod

9 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Linus Torvalds [Sun, 2 Aug 2015 00:42:14 +0000 (17:42 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/viro/vfs

Pull VFS fix from Al Viro:
 "Spurious ENOTDIR fix"

This should fix the problems reported by Dominique Martinet and Hugh
Dickins.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  link_path_walk(): be careful when failing with ENOTDIR

9 years agolink_path_walk(): be careful when failing with ENOTDIR
Al Viro [Sat, 1 Aug 2015 23:59:28 +0000 (19:59 -0400)]
link_path_walk(): be careful when failing with ENOTDIR

In RCU mode we might end up with dentry evicted just we check
that it's a directory.  In such case we should return ECHILD
rather than ENOTDIR, so that pathwalk would be retries in non-RCU
mode.

Breakage had been introduced in commit b18825a - prior to that
we were looking at nd->inode, which had been fetched before
verifying that ->d_seq was still valid.  That form of check
would only be satisfied if at some point the pathname prefix
would indeed have resolved to a non-directory.  The fix consists
of checking ->d_seq after we'd run into a non-directory dentry,
and failing with ECHILD in case of mismatch.

Note that all branches since 3.12 have that problem...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agoMerge tag 'dmaengine-fix-4.2-rc5' of git://git.infradead.org/users/vkoul/slave-dma
Linus Torvalds [Sat, 1 Aug 2015 19:47:04 +0000 (12:47 -0700)]
Merge tag 'dmaengine-fix-4.2-rc5' of git://git.infradead.org/users/vkoul/slave-dma

Pull dmaengine fixes from Vinod Koul:
 "We had a regression due to reuse of descriptor so we have reverted
  that.

  The rest are driver fixes:

   - at_hdmac and at_xdmac for residue, trannfer width, and channel config
   - pl330 final fix for dma fails and overflow issue
   - xgene resouce map fix
   - mv_xor big endian op fix"

* tag 'dmaengine-fix-4.2-rc5' of git://git.infradead.org/users/vkoul/slave-dma:
  Revert "dmaengine: virt-dma: don't always free descriptor upon completion"
  dmaengine: mv_xor: fix big endian operation in register mode
  dmaengine: xgene-dma: Fix the resource map to handle overlapping
  dmaengine: at_xdmac: fix transfer data width in at_xdmac_prep_slave_sg()
  dmaengine: at_hdmac: fix residue computation
  dmaengine: at_xdmac: fix bug about channel configuration
  dmaengine: pl330: Really fix choppy sound because of wrong residue calculation
  dmaengine: pl330: Fix overflow when reporting residue in memcpy

9 years agoMerge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 1 Aug 2015 16:47:11 +0000 (09:47 -0700)]
Merge branch 'irq-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull irq fixlets from Thomas Gleixner:
 "Just two updates to the maintainers file"

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  MAINTAINERS: Appoint Jiang and Marc as irqdomain maintainers
  MAINTAINERS: Appoint Marc Zyngier as irqchips co-maintainer

9 years agoMerge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 1 Aug 2015 16:16:33 +0000 (09:16 -0700)]
Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull x86 fixes from Ingo Molnar:
 "Fallout from the recent NMI fixes: make x86 LDT handling more robust.

  Also some EFI fixes"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/ldt: Make modify_ldt synchronous
  x86/xen: Probe target addresses in set_aliased_prot() before the hypercall
  x86/irq: Use the caller provided polarity setting in mp_check_pin_attr()
  efi: Check for NULL efi kernel parameters
  x86/efi: Use all 64 bit of efi_memmap in setup_e820()

9 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Linus Torvalds [Sat, 1 Aug 2015 00:10:56 +0000 (17:10 -0700)]
Merge git://git./linux/kernel/git/davem/net

Pull networking fixes from David Miller:

 1) Must teardown SR-IOV before unregistering netdev in igb driver, from
    Alex Williamson.

 2) Fix ipv6 route unreachable crash in IPVS, from Alex Gartrell.

 3) Default route selection in ipv4 should take the prefix length, table
    ID, and TOS into account, from Julian Anastasov.

 4) sch_plug must have a reset method in order to purge all buffered
    packets when the qdisc is reset, likewise for sch_choke, from WANG
    Cong.

 5) Fix deadlock and races in slave_changelink/br_setport in bridging.
    From Nikolay Aleksandrov.

 6) mlx4 bug fixes (wrong index in port even propagation to VFs,
    overzealous BUG_ON assertion, etc.) from Ido Shamay, Jack
    Morgenstein, and Or Gerlitz.

 7) Turn off klog message about SCTP userspace interface compat that
    makes no sense at all, from Daniel Borkmann.

 8) Fix unbounded restarts of inet frag eviction process, causing NMI
    watchdog soft lockup messages, from Florian Westphal.

 9) Suspend/resume fixes for r8152 from Hayes Wang.

10) Fix busy loop when MSG_WAITALL|MSG_PEEK is used in TCP recv, from
    Sabrina Dubroca.

11) Fix performance regression when removing a lot of routes from the
    ipv4 routing tables, from Alexander Duyck.

12) Fix device leak in AF_PACKET, from Lars Westerhoff.

13) AF_PACKET also has a header length comparison bug due to signedness,
    from Alexander Drozdov.

14) Fix bug in EBPF tail call generation on x86, from Daniel Borkmann.

15) Memory leaks, TSO stats, watchdog timeout and other fixes to
    thunderx driver from Sunil Goutham and Thanneeru Srinivasulu.

16) act_bpf can leak memory when replacing programs, from Daniel
    Borkmann.

17) WOL packet fixes in gianfar driver, from Claudiu Manoil.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (79 commits)
  stmmac: fix missing MODULE_LICENSE in stmmac_platform
  gianfar: Enable device wakeup when appropriate
  gianfar: Fix suspend/resume for wol magic packet
  gianfar: Fix warning when CONFIG_PM off
  act_pedit: check binding before calling tcf_hash_release()
  net: sk_clone_lock() should only do get_net() if the parent is not a kernel socket
  net: sched: fix refcount imbalance in actions
  r8152: reset device when tx timeout
  r8152: add pre_reset and post_reset
  qlcnic: Fix corruption while copying
  act_bpf: fix memory leaks when replacing bpf programs
  net: thunderx: Fix for crash while BGX teardown
  net: thunderx: Add PCI driver shutdown routine
  net: thunderx: Fix crash when changing rss with mutliple traffic flows
  net: thunderx: Set watchdog timeout value
  net: thunderx: Wakeup TXQ only if CQE_TX are processed
  net: thunderx: Suppress alloc_pages() failure warnings
  net: thunderx: Fix TSO packet statistic
  net: thunderx: Fix memory leak when changing queue count
  net: thunderx: Fix RQ_DROP miscalculation
  ...

9 years agoMerge branch 'for-linus-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/mason...
Linus Torvalds [Sat, 1 Aug 2015 00:05:37 +0000 (17:05 -0700)]
Merge branch 'for-linus-4.2' of git://git./linux/kernel/git/mason/linux-btrfs

Pull btrfs fixes from Chris Mason:
 "Filipe fixed up a hard to trigger ENOSPC regression from our merge
  window pull, and we have a few other smaller fixes"

* 'for-linus-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: fix quick exhaustion of the system array in the superblock
  btrfs: its btrfs_err() instead of btrfs_error()
  btrfs: Avoid NULL pointer dereference of free_extent_buffer when read_tree_block() fail
  btrfs: Fix lockdep warning of btrfs_run_delayed_iputs()

9 years agoMerge tag 'sound-4.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai...
Linus Torvalds [Sat, 1 Aug 2015 00:00:25 +0000 (17:00 -0700)]
Merge tag 'sound-4.2-rc5' of git://git./linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
 "This became a relative big update as it includes the collected ASoC
  fixes.  There are a few fixes in ASoC core side, mostly for DAPM and
  the new topology API.  The rest are various ASoC driver-specific
  fixes, as well as the usual HD-audio and USB-audio quirks"

* tag 'sound-4.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (29 commits)
  ALSA: hda - Fix MacBook Pro 5,2 quirk
  ALSA: hda - Fix race between PM ops and HDA init/probe
  ALSA: usb-audio: add dB range mapping for some devices
  ALSA: hda - Apply a fixup to Dell Vostro 5480
  ALSA: hda - Add pin quirk for the headset mic jack detection on Dell laptop
  ALSA: hda - Apply fixup for another Toshiba Satellite S50D
  ALSA: fireworks: add support for AudioFire2 quirk
  ALSA: hda - Fix the headset mic that will not work on Dell desktop machine
  ALSA: hda - fix cs4210_spdif_automute()
  ASoC: pcm1681: Fix setting de-emphasis sampling rate selection
  ASoC: ssm4567: Keep TDM_BCLKS in ssm4567_set_dai_fmt
  ASoC: sgtl5000: Fix up define for SGTL5000_SMALL_POP
  ASoC: dapm: Don't add prefix to widget stream name
  ASoC: rt5645: Check if codec is initialized in workqueue handler
  ASoC: Intel: Get correct usage_count value to load firmware
  ASoC: topology: Fix to add dapm mixer info
  ASoC: zx: spdif: Fix devm_ioremap_resource return value check
  ASoC: zx: i2s: Fix devm_ioremap_resource return value check
  ASoC: mediatek: Use platform_of_node for machine drivers
  ASoC: Free card DAPM context on snd_soc_instantiate_card() error path
  ...

9 years agostmmac: fix missing MODULE_LICENSE in stmmac_platform
Joachim Eastwood [Fri, 31 Jul 2015 17:13:22 +0000 (19:13 +0200)]
stmmac: fix missing MODULE_LICENSE in stmmac_platform

Commit 50649ab14982 ("stmmac: drop driver from stmmac platform code")
was a bit overzealous in removing code and dropped the MODULE_*
macro's that are still needed since stmmac_platform can be a module.
Fix this by putting the macro's remvoed in 50649ab14982 back.

This fixes the following errors when used as a module:
  stmmac_platform: module license 'unspecified' taints kernel.
  Disabling lock debugging due to kernel taint
  stmmac_platform: Unknown symbol devm_kmalloc (err 0)
  stmmac_platform: Unknown symbol stmmac_suspend (err 0)
  stmmac_platform: Unknown symbol platform_get_irq_byname (err 0)
  stmmac_platform: Unknown symbol stmmac_dvr_remove (err 0)
  stmmac_platform: Unknown symbol platform_get_resource (err 0)
  stmmac_platform: Unknown symbol of_get_phy_mode (err 0)
  stmmac_platform: Unknown symbol of_property_read_u32_array (err 0)
  stmmac_platform: Unknown symbol of_alias_get_id (err 0)
  stmmac_platform: Unknown symbol stmmac_resume (err 0)
  stmmac_platform: Unknown symbol stmmac_dvr_probe (err 0)

Fixes: 50649ab14982 ("stmmac: drop driver from stmmac platform code")
Reported-by: Igor Gnatenko <i.gnatenko.brain@gmail.com>
Signed-off-by: Joachim Eastwood <manabian@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge branch 'gianfar-wol-fixes'
David S. Miller [Fri, 31 Jul 2015 22:41:50 +0000 (15:41 -0700)]
Merge branch 'gianfar-wol-fixes'

Claudiu Manoil says:

====================
gianfar: wol magic packet fixes

These changes were already validated as part of FSL SDK.
Patch 2 fixes occasional wake-on magic packet failures during
traffic, probably due to incorrect traffic stop/ device halt
sequence and incorrect usage of txlock.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agogianfar: Enable device wakeup when appropriate
Claudiu Manoil [Fri, 31 Jul 2015 15:38:33 +0000 (18:38 +0300)]
gianfar: Enable device wakeup when appropriate

The wol_en flag is 0 by default anyway, and we have the
following inconsistency: a MAGIC packet wol capable eth
interface is registered as a wake-up source but unable
to wake-up the system as wol_en is 0 (wake-on flag set to 'd').
Calling set_wakeup_enable() at netdev open is just redundant
because wol_en is 0 by default.
Let only ethtool call set_wakeup_enable() for now.

The bflock is obviously obsoleted, its utility has been corroded
over time.  The bitfield flags used today in gianfar are accessed
only on the init/ config path, with no real possibility of
concurrency - nothing that would justify smth. like bflock.

Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>