GitHub/LineageOS/android_kernel_motorola_exynos9610.git
7 years agoIB/cma: Fix a race condition in iboe_addr_get_sgid()
Bart Van Assche [Mon, 19 Dec 2016 17:00:05 +0000 (18:00 +0100)]
IB/cma: Fix a race condition in iboe_addr_get_sgid()

Code that dereferences the struct net_device ip_ptr member must be
protected with an in_dev_get() / in_dev_put() pair. Hence insert
calls to these functions.

Fixes: commit 7b85627b9f02 ("IB/cma: IBoE (RoCE) IP-based GID addressing")
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Moni Shoua <monis@mellanox.com>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Cc: Roland Dreier <roland@purestorage.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/rxe: Fix a memory leak in rxe_qp_cleanup()
Bart Van Assche [Thu, 15 Dec 2016 16:15:07 +0000 (17:15 +0100)]
IB/rxe: Fix a memory leak in rxe_qp_cleanup()

A socket is associated with every QP by the rxe driver but sock_release()
is never called. Add a call to sock_release() in rxe_qp_cleanup().

Fixes: commit 8700e3e7c48A5 ("Add Soft RoCE driver")
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Moni Shoua <monis@mellanox.com>
Cc: Kamal Heib <kamalh@mellanox.com>
Cc: Amir Vadai <amirv@mellanox.com>
Cc: Haggai Eran <haggaie@mellanox.com>
Cc: <stable@vger.kernel.org>
Reviewed-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoiw_cxgb4: set correct FetchBurstMax for QPs
Steve Wise [Thu, 15 Dec 2016 16:09:35 +0000 (08:09 -0800)]
iw_cxgb4: set correct FetchBurstMax for QPs

The current QP FetchBurstMax value is 256B, which
is incorrect since a WR can exceed that value.  The
result being a partial WR fetched by hardware, and
a fatal "bad WR" error posted by the SGE.

So bump the FetchBurstMax to 512B.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoMerge branch 'vmw_pvrdma' into merge-test
Doug Ledford [Wed, 14 Dec 2016 19:56:21 +0000 (14:56 -0500)]
Merge branch 'vmw_pvrdma' into merge-test

7 years agoIB: Add vmw_pvrdma driver
Adit Ranadive [Mon, 3 Oct 2016 02:10:22 +0000 (19:10 -0700)]
IB: Add vmw_pvrdma driver

This patch series adds a driver for a paravirtual RDMA device. The
device is developed for VMware's Virtual Machines and allows existing RDMA
applications to continue to use existing Verbs API when deployed in VMs
on ESXi. We recently did a presentation in the OFA Workshop [1] regarding
this device.

Description and RDMA Support
============================
The virtual device is exposed as a dual function PCIe device. One part
is a virtual network device (VMXNet3) which provides networking properties
like MAC, IP addresses to the RDMA part of the device. The networking
properties are used to register GIDs required by RDMA applications to
communicate.

These patches add support and the all required infrastructure for
letting applications use such a device. We support the mandatory Verbs API as
well as the base memory management extensions (Local Inv, Send with Inv and
Fast Register Work Requests). We currently support both Reliable Connected
and Unreliable Datagram QPs but do not support Shared Receive Queues
(SRQs).

Also, we support the following types of Work Requests:
 o Send/Receive (with or without Immediate Data)
 o RDMA Write (with or without Immediate Data)
 o RDMA Read
 o Local Invalidate
 o Send with Invalidate
 o Fast Register Work Requests

This version only adds support for version 1 of RoCE. We will add RoCEv2
support in a future patch. We do support registration of both MAC-based
and IP-based GIDs. I have also created a git tree for our user-level driver
[2].

Testing
=======
We have tested this internally for various types of Guest OS - Red Hat,
Centos, Ubuntu 12.04/14.04/16.04, Oracle Enterprise Linux, SLES 12
using backported versions of this driver. The tests included several
runs of the performance tests (included with OFED), Intel MPI PingPong
benchmark on OpenMPI, krping for FRWRs. Mellanox has been kind enough
to test the backported version of the driver internally on their hardware
using a VMware provided ESX build. I have also applied and tested this
with Doug's k.o/for-4.9 branch (commit 5603910b). Note, that this patch
series should be applied all together. I split out the commits so that
it may be easier to review.

PVRDMA Resources
================
[1] OFA Workshop Presentation -
https://openfabrics.org/images/eventpresos/2016presentations/102parardma.pdf

[2] Libpvrdma User-level library -
http://git.openfabrics.org/?p=~aditr/libpvrdma.git;a=summary

Reviewed-by: Jorgen Hansen <jhansen@vmware.com>
Reviewed-by: George Zhang <georgezhang@vmware.com>
Reviewed-by: Aditya Sarwade <asarwade@vmware.com>
Reviewed-by: Bryan Tan <bryantan@vmware.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Adit Ranadive <aditr@vmware.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoMerge branches 'misc', 'qedr', 'reject-helpers', 'rxe' and 'srp' into merge-test
Doug Ledford [Wed, 14 Dec 2016 19:44:47 +0000 (14:44 -0500)]
Merge branches 'misc', 'qedr', 'reject-helpers', 'rxe' and 'srp' into merge-test

7 years agoMerge branch 'mlx' into merge-test
Doug Ledford [Wed, 14 Dec 2016 19:44:25 +0000 (14:44 -0500)]
Merge branch 'mlx' into merge-test

7 years agoMerge branch 'hfi1' into merge-test
Doug Ledford [Wed, 14 Dec 2016 19:44:08 +0000 (14:44 -0500)]
Merge branch 'hfi1' into merge-test

7 years agoMerge branches 'chelsio', 'debug-cleanup', 'hns' and 'i40iw' into merge-test
Doug Ledford [Wed, 14 Dec 2016 19:43:14 +0000 (14:43 -0500)]
Merge branches 'chelsio', 'debug-cleanup', 'hns' and 'i40iw' into merge-test

7 years agoIB/mlx4: fix improper return value
Pan Bian [Sun, 4 Dec 2016 06:45:38 +0000 (14:45 +0800)]
IB/mlx4: fix improper return value

If uhw->inlen is non-zero, the value of variable err is 0 if the copy
succeeds. Then, if kzalloc() or kmalloc() returns a NULL pointer, it
will return 0 to the callers. As a result, the callers cannot detect the
errors. This patch fixes the bug, assign "-ENOMEM" to err before the
NULL pointer checks, and remove the initialization of err at the
beginning.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=189031
Signed-off-by: Pan Bian <bianpan2016@163.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/ocrdma: fix bad initialization
Pan Bian [Sat, 3 Dec 2016 13:10:21 +0000 (21:10 +0800)]
IB/ocrdma: fix bad initialization

In function ocrdma_mbx_create_ah_tbl(), returns the value of status on
errors. However, because status is initialized with 0, 0 will be
returned even if on error paths. This patch initialize status with
"-ENOMEM".

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=188831

Signed-off-by: Pan Bian <bianpan2016@163.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoinfiniband: nes: return value of skb_linearize should be handled
Zhouyi Zhou [Wed, 7 Dec 2016 07:30:05 +0000 (15:30 +0800)]
infiniband: nes: return value of skb_linearize should be handled

Return value of skb_linearize should be handled in function
nes_netdev_start_xmit.

Compiled in x86_64
Signed-off-by: Zhouyi Zhou <yizhouzhou@ict.ac.cn>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoMAINTAINERS: Update Intel RDMA RNIC driver maintainers
Shiraz Saleem [Mon, 5 Dec 2016 17:28:32 +0000 (11:28 -0600)]
MAINTAINERS: Update Intel RDMA RNIC driver maintainers

Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoMAINTAINERS: Remove Mitesh Ahuja from emulex maintainers
Leon Romanovsky [Sun, 4 Dec 2016 06:47:46 +0000 (08:47 +0200)]
MAINTAINERS: Remove Mitesh Ahuja from emulex maintainers

Remove Mitesh Ahuja <mitesh.ahuja@avagotech.com> from
maintainers file. This email address seems not active
and causes to mail bounces during submissions.

Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/core: fix unmap_sg argument
Sebastian Ott [Fri, 2 Dec 2016 13:45:26 +0000 (14:45 +0100)]
IB/core: fix unmap_sg argument

__ib_umem_release calls dma_unmap_sg with a different number of
sg_entries than ib_umem_get uses for dma_map_sg. This might cause
trouble for implementations that merge sglist entries and results
in the following dma debug complaint:

DMA-API: device driver frees DMA sg list with different entry
         count [map count=2] [unmap count=1]

Fix it by using the correct value.

Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoqede: fix general protection fault may occur on probe
Amrani, Ram [Wed, 23 Nov 2016 08:03:04 +0000 (08:03 +0000)]
qede: fix general protection fault may occur on probe

The recent introduction of qedr driver support in qede causes a GPF when probing the driver in a server without a RoCE enabled QLogic NIC. This fix avoids using an uninitialized pointer in such a case. Caught by the kernel test robot.

Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/mthca: Replace pci_pool_alloc by pci_pool_zalloc
Souptick Joarder [Thu, 1 Dec 2016 18:41:59 +0000 (00:11 +0530)]
IB/mthca: Replace pci_pool_alloc by pci_pool_zalloc

In mthca_create_ah(), pci_pool_alloc() followed by memset will be
replaced by pci_pool_zalloc()

Signed-off-by: Souptick joarder <jrdr.linux@gmail.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agomlx5, calc_sq_size(): Make a debug message more informative
Bart Van Assche [Tue, 6 Dec 2016 01:19:52 +0000 (17:19 -0800)]
mlx5, calc_sq_size(): Make a debug message more informative

Make it clear that qp->sq.wqe_cnt is not the number of WQEs.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Eli Cohen <eli@mellanox.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agomlx5: Remove a set-but-not-used variable
Bart Van Assche [Tue, 6 Dec 2016 01:18:27 +0000 (17:18 -0800)]
mlx5: Remove a set-but-not-used variable

This has been detected by building the mlx5 driver with W=1.

Fixes: 1a412fb1caa2 ('net/mlx5: Fixes: 1a412fb1caa2 (IB/mlx5: Modify QP
commands via mlx5 ifc')
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Eli Cohen <eli@mellanox.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agomlx5: Use { } instead of { 0 } to init struct
Bart Van Assche [Tue, 6 Dec 2016 01:18:08 +0000 (17:18 -0800)]
mlx5: Use { } instead of { 0 } to init struct

Detected by sparse.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Eli Cohen <eli@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/srp: Make writing the add_target sysfs attr interruptible
Bart Van Assche [Mon, 21 Nov 2016 21:58:18 +0000 (13:58 -0800)]
IB/srp: Make writing the add_target sysfs attr interruptible

Avoid that shutdown of srp_daemon is delayed if add_target_mutex is
held by another process.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/srp: Make mapping failures easier to debug
Bart Van Assche [Mon, 21 Nov 2016 21:57:41 +0000 (13:57 -0800)]
IB/srp: Make mapping failures easier to debug

Make it easier to figure out what is going on if memory mapping
fails because more memory regions than mr_per_cmd are needed.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/srp: Make login failures easier to debug
Bart Van Assche [Mon, 21 Nov 2016 21:57:24 +0000 (13:57 -0800)]
IB/srp: Make login failures easier to debug

If login fails because memory region allocation failed it can be
hard to figure out what happened. Make it easier to figure out
why login failed by logging a message if ib_alloc_mr() fails.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/srp: Introduce a local variable in srp_add_one()
Bart Van Assche [Mon, 21 Nov 2016 21:57:07 +0000 (13:57 -0800)]
IB/srp: Introduce a local variable in srp_add_one()

This patch makes the srp_add_one() code more compact and does not
change any functionality.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/srp: Fix CONFIG_DYNAMIC_DEBUG=n build
Bart Van Assche [Mon, 21 Nov 2016 21:56:46 +0000 (13:56 -0800)]
IB/srp: Fix CONFIG_DYNAMIC_DEBUG=n build

Avoid that the kernel build fails as follows if dynamic debug support
is disabled:

drivers/infiniband/ulp/srp/ib_srp.c:2272:3: error: implicit declaration of function 'DEFINE_DYNAMIC_DEBUG_METADATA'
drivers/infiniband/ulp/srp/ib_srp.c:2272:33: error: 'ddm' undeclared (first use in this function)
drivers/infiniband/ulp/srp/ib_srp.c:2275:39: error: '_DPRINTK_FLAGS_PRINT' undeclared (first use in this function)

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/multicast: Check ib_find_pkey() return value
Bart Van Assche [Mon, 21 Nov 2016 18:22:17 +0000 (10:22 -0800)]
IB/multicast: Check ib_find_pkey() return value

This patch avoids that Coverity complains about not checking the
ib_find_pkey() return value.

Fixes: commit 547af76521b3 ("IB/multicast: Report errors on multicast groups if P_key changes")
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIPoIB: Avoid reading an uninitialized member variable
Bart Van Assche [Mon, 21 Nov 2016 18:21:41 +0000 (10:21 -0800)]
IPoIB: Avoid reading an uninitialized member variable

This patch avoids that Coverity reports the following:

    Using uninitialized value port_attr.state when calling printk

Fixes: commit 94232d9ce817 ("IPoIB: Start multicast join process only on active ports")
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Erez Shitrit <erezsh@mellanox.com>
Cc: <stable@vger.kernel.org>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/mad: Fix an array index check
Bart Van Assche [Mon, 21 Nov 2016 18:21:17 +0000 (10:21 -0800)]
IB/mad: Fix an array index check

The array ib_mad_mgmt_class_table.method_table has MAX_MGMT_CLASS
(80) elements. Hence compare the array index with that value instead
of with IB_MGMT_MAX_METHODS (128). This patch avoids that Coverity
reports the following:

Overrunning array class->method_table of 80 8-byte elements at element index 127 (byte offset 1016) using index convert_mgmt_class(mad_hdr->mgmt_class) (which evaluates to 127).

Fixes: commit b7ab0b19a85f ("IB/mad: Verify mgmt class in received MADs")
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: <stable@vger.kernel.org>
Reviewed-by: Hal Rosenstock <hal@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/mlx4: Rework special QP creation error path
Bart Van Assche [Mon, 14 Nov 2016 16:44:11 +0000 (08:44 -0800)]
IB/mlx4: Rework special QP creation error path

The special QP creation error path relies on offset_of(struct mlx4_ib_sqp,
qp) == 0. Remove this assumption because that makes the QP creation
code easier to understand.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Laurence Oberman <loberman@redhat.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/srpt: Report login failures only once
Bart Van Assche [Sat, 12 Nov 2016 00:36:06 +0000 (16:36 -0800)]
IB/srpt: Report login failures only once

Report the following message only once if no ACL has been configured
yet for an initiator port:

"Rejected login because no ACL has been configured yet for initiator %s.\n"

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Nicholas Bellinger <nab@linux-iscsi.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagig@grimberg.me>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/usnic: simplify IS_ERR_OR_NULL to IS_ERR
Julia Lawall [Fri, 11 Nov 2016 19:04:26 +0000 (20:04 +0100)]
IB/usnic: simplify IS_ERR_OR_NULL to IS_ERR

The function usnic_ib_qp_grp_get_chunk only returns an ERR_PTR value or a
valid pointer, never NULL.  The same is true of get_qp_res_chunk, which
just returns the result of calling usnic_ib_qp_grp_get_chunk.  Simplify
IS_ERR_OR_NULL to IS_ERR in both cases.

The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@@
expression t,e;
@@

t = \(usnic_ib_qp_grp_get_chunk(...)\|get_qp_res_chunk(...)\)
... when != t=e
- IS_ERR_OR_NULL(t)
+ IS_ERR(t)

@@
expression t,e,e1;
@@

t = \(usnic_ib_qp_grp_get_chunk(...)\|get_qp_res_chunk(...)\)
... when != t=e
?- t ? PTR_ERR(t) : e1
+ PTR_ERR(t)
... when any
// </smpl>

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/core: Issue DREQ when receiving REQ/REP for stale QP
Hans Westgaard Ry [Fri, 28 Oct 2016 11:14:29 +0000 (13:14 +0200)]
IB/core: Issue DREQ when receiving REQ/REP for stale QP

from "InfiBand Architecture Specifications Volume 1":

  A QP is said to have a stale connection when only one side has
  connection information. A stale connection may result if the remote CM
  had dropped the connection and sent a DREQ but the DREQ was never
  received by the local CM. Alternatively the remote CM may have lost
  all record of past connections because its node crashed and rebooted,
  while the local CM did not become aware of the remote node's reboot
  and therefore did not clean up stale connections.

and:

   A local CM may receive a REQ/REP for a stale connection. It shall
   abort the connection issuing REJ to the REQ/REP. It shall then issue
   DREQ with "DREQ:remote QPN” set to the remote QPN from the REQ/REP.

This patch solves a problem with reuse of QPN. Current codebase, that
is IPoIB, relies on a REAP-mechanism to do cleanup of the structures
in CM. A problem with this is the timeconstants governing this
mechanism; they are up to 768 seconds and the interface may look
inresponsive in that period.  Issuing a DREQ (and receiving a DREP)
does the necessary cleanup and the interface comes up.

Signed-off-by: Hans Westgaard Ry <hans.westgaard.ry@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/nes: use new api ethtool_{get|set}_link_ksettings
Philippe Reynes [Tue, 25 Oct 2016 15:29:47 +0000 (17:29 +0200)]
IB/nes: use new api ethtool_{get|set}_link_ksettings

The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.

Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/isert: do not ignore errors in dma_map_single()
Alexey Khoroshilov [Fri, 21 Oct 2016 22:01:21 +0000 (01:01 +0300)]
IB/isert: do not ignore errors in dma_map_single()

There are several places, where errors in dma_map_single() are
ignored. The patch fixes them.

Found by Linux Driver Verification project (linuxtesting.org).

Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Acked-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/rdmavt: Only put mmap_info ref if it exists
Jim Foraker [Tue, 1 Nov 2016 20:44:12 +0000 (13:44 -0700)]
IB/rdmavt: Only put mmap_info ref if it exists

rvt_create_qp() creates qp->ip only when a qp creation request comes from
userspace (udata is not NULL).  If we exceed the number of available
queue pairs however, the error path always attempts to put a kref to this
structure.  If the requestor is inside the kernel, this leads to a crash.

We fix this by checking that qp->ip is not NULL before caling kref_put().

Signed-off-by: Jim Foraker <foraker1@llnl.gov>
Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Acked-by: Jonathan Toppins <jtoppins@redhat.com>
Acked-by: Alex Estrin <alex.estrin@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/rdmavt: Handle the kthread worker using the new API
Petr Mladek [Wed, 19 Oct 2016 12:07:20 +0000 (14:07 +0200)]
IB/rdmavt: Handle the kthread worker using the new API

Use the new API to create and destroy the cq kthread worker.
The API hides some implementation details.

In particular, kthread_create_worker() allocates and initializes
struct kthread_worker. It runs the kthread the right way and stores
task_struct into the worker structure. In addition, the *on_cpu()
variant binds the kthread to the given cpu and the related memory
node.

kthread_destroy_worker() flushes all pending works, stops
the kthread and frees the structure.

This patch does not change the existing behavior. Note that we must
use the on_cpu() variant because the function starts the kthread
and it must bind it to the right CPU before waking. The numa node
is associated for given CPU as well.

Signed-off-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/rdmavt: Avoid queuing work into a destroyed cq kthread worker
Petr Mladek [Wed, 19 Oct 2016 12:07:19 +0000 (14:07 +0200)]
IB/rdmavt: Avoid queuing work into a destroyed cq kthread worker

The memory barrier is not enough to protect queuing works into
a destroyed cq kthread. Just imagine the following situation:

CPU1 CPU2

rvt_cq_enter()
  worker =  cq->rdi->worker;

rvt_cq_exit()
  rdi->worker = NULL;
  smp_wmb();
  kthread_flush_worker(worker);
  kthread_stop(worker->task);
  kfree(worker);

  // nothing queued yet =>
  // nothing flushed and
  // happily stopped and freed

  if (likely(worker)) {
     // true => read before CPU2 acted
     cq->notify = RVT_CQ_NONE;
     cq->triggered++;
     kthread_queue_work(worker, &cq->comptask);

  BANG: worker has been flushed/stopped/freed in the meantime.

This patch solves this by protecting the critical sections by
rdi->n_cqs_lock. It seems that this lock is not much contended
and looks reasonable for this purpose.

One catch is that rvt_cq_enter() might be called from IRQ context.
Therefore we must always take the lock with IRQs disabled to avoid
a possible deadlock.

Signed-off-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/mlx4: avoid a -Wmaybe-uninitialize warning
Arnd Bergmann [Tue, 25 Oct 2016 16:16:20 +0000 (18:16 +0200)]
IB/mlx4: avoid a -Wmaybe-uninitialize warning

There is an old warning about mlx4_SW2HW_EQ_wrapper on x86:

ethernet/mellanox/mlx4/resource_tracker.c: In function ‘mlx4_SW2HW_EQ_wrapper’:
ethernet/mellanox/mlx4/resource_tracker.c:3071:10: error: ‘eq’ may be used uninitialized in this function [-Werror=maybe-uninitialized]

The problem here is that gcc won't track the state of the variable
across a spin_unlock. Moving the assignment out of the lock is
safe here and avoids the warning.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/mlx5: avoid bogus -Wmaybe-uninitialized warning
Arnd Bergmann [Mon, 24 Oct 2016 20:48:21 +0000 (22:48 +0200)]
IB/mlx5: avoid bogus -Wmaybe-uninitialized warning

We get a false-positive warning in linux-next for the mlx5 driver:

infiniband/hw/mlx5/mr.c: In function ‘mlx5_ib_reg_user_mr’:
infiniband/hw/mlx5/mr.c:1172:5: error: ‘order’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
infiniband/hw/mlx5/mr.c:1161:6: note: ‘order’ was declared here
infiniband/hw/mlx5/mr.c:1173:6: error: ‘ncont’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
infiniband/hw/mlx5/mr.c:1160:6: note: ‘ncont’ was declared here
infiniband/hw/mlx5/mr.c:1173:6: error: ‘page_shift’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
infiniband/hw/mlx5/mr.c:1158:6: note: ‘page_shift’ was declared here
infiniband/hw/mlx5/mr.c:1143:13: error: ‘npages’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
infiniband/hw/mlx5/mr.c:1159:6: note: ‘npages’ was declared here

I had a trivial workaround for gcc-5 or higher, but that didn't work
on gcc-4.9 unfortunately.

The only way I found to avoid the warnings for gcc-4.9, short of
initializing each of the arguments first was to change the calling
conventions to separate the error code from the umem pointer. This
avoids casting the error codes from one pointer to another incompatible
pointer, and lets gcc figure out when that the data is actually valid
whenever we return successfully.

Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agordma UAPI: Use __kernel_sockaddr_storage
Jason Gunthorpe [Thu, 27 Oct 2016 16:51:17 +0000 (10:51 -0600)]
rdma UAPI: Use __kernel_sockaddr_storage

The kernel side is #ifdef'd to this type, and the UAPI header
should use it directly. It has slightly different alignment
requirments from the usual user space version.

Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agonvmet_rdma: log the connection reject message
Steve Wise [Wed, 26 Oct 2016 19:36:48 +0000 (12:36 -0700)]
nvmet_rdma: log the connection reject message

Acked-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoib_isert: log the connection reject message
Steve Wise [Wed, 26 Oct 2016 19:36:48 +0000 (12:36 -0700)]
ib_isert: log the connection reject message

Acked-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agords_rdma: log the connection reject message
Steve Wise [Wed, 26 Oct 2016 19:36:48 +0000 (12:36 -0700)]
rds_rdma: log the connection reject message

Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoib_iser: log the connection reject message
Steve Wise [Wed, 26 Oct 2016 19:36:47 +0000 (12:36 -0700)]
ib_iser: log the connection reject message

Acked-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agonvme-rdma: use rdma connection reject helper functions
Steve Wise [Wed, 26 Oct 2016 19:36:47 +0000 (12:36 -0700)]
nvme-rdma: use rdma connection reject helper functions

Also add nvme cm status strings and use them.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agordma_cm: add rdma_consumer_reject_data helper function
Steve Wise [Wed, 26 Oct 2016 19:36:47 +0000 (12:36 -0700)]
rdma_cm: add rdma_consumer_reject_data helper function

rdma_consumer_reject_data() will return the private data pointer
and length if any is available.

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agordma_cm: add rdma_is_consumer_reject() helper function
Steve Wise [Wed, 26 Oct 2016 19:36:47 +0000 (12:36 -0700)]
rdma_cm: add rdma_is_consumer_reject() helper function

Return true if the peer consumer application rejected the
connection attempt.

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agordma_cm: add rdma_reject_msg() helper function
Steve Wise [Wed, 26 Oct 2016 19:36:40 +0000 (12:36 -0700)]
rdma_cm: add rdma_reject_msg() helper function

rdma_reject_msg() returns a pointer to a string message associated with
the transport reject reason codes.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoqedr: remove pointless NULL check in qedr_post_send()
Wei Yongjun [Wed, 2 Nov 2016 13:11:32 +0000 (13:11 +0000)]
qedr: remove pointless NULL check in qedr_post_send()

Remove pointless NULL check for 'wr' in qedr_post_send().

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Ram Amrani <Ram.Amrani@cavium.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoqedr: Use list_move_tail instead of list_del/list_add_tail
Wei Yongjun [Sat, 29 Oct 2016 16:19:53 +0000 (16:19 +0000)]
qedr: Use list_move_tail instead of list_del/list_add_tail

Using list_move_tail() instead of list_del() + list_add_tail().

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Acked-by: Ram Amrani <Ram.Amrani@cavium.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoqedr: Fix possible memory leak in qedr_create_qp()
Wei Yongjun [Fri, 28 Oct 2016 16:33:47 +0000 (16:33 +0000)]
qedr: Fix possible memory leak in qedr_create_qp()

'qp' is malloced in qedr_create_qp() and should be freed before leaving
from the error handling cases, otherwise it will cause memory leak.

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Ram Amrani <Ram.Amrani@cavium.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoqedr: return -EINVAL if pd is null and avoid null ptr dereference
Colin Ian King [Tue, 18 Oct 2016 18:39:28 +0000 (19:39 +0100)]
qedr: return -EINVAL if pd is null and avoid null ptr dereference

Currently, if pd is null then we hit a null pointer derference
on accessing pd->pd_id.  Instead of just printing an error message
we should also return -EINVAL immediately.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/mad: Eliminate redundant SM class version defines for OPA
Hal Rosenstock [Tue, 18 Oct 2016 17:20:29 +0000 (13:20 -0400)]
IB/mad: Eliminate redundant SM class version defines for OPA

and rename class version define to indicate SM rather than SMP or SMI

Signed-off-by: Hal Rosenstock <hal@mellanox.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/mlx5: Properly adjust rate limit on QP state transitions
Bodong Wang [Thu, 1 Dec 2016 11:43:16 +0000 (13:43 +0200)]
IB/mlx5: Properly adjust rate limit on QP state transitions

- Add MODIFY_QP_EX CMD to extend modify_qp.
- Rate limit will be updated in the following state transactions: RTR2RTS,
  RTS2RTS. The limit will be removed when SQ is in RST and ERR state.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/uverbs: Extend modify_qp and support packet pacing
Bodong Wang [Thu, 1 Dec 2016 11:43:15 +0000 (13:43 +0200)]
IB/uverbs: Extend modify_qp and support packet pacing

An new uverbs command ib_uverbs_ex_modify_qp is added to support more QP
attributes. User driver should choose to call the legacy/extended API
based on input mask.

IB_USER_LAST_QP_ATTR_MASK is added to indicated the maximum bit position
which supports legacy ib_uverbs_modify_qp.
IB_USER_LEGACY_LAST_QP_ATTR_MASK indicates the maximum bit position
which supports ib_uverbs_ex_modify_qp, the value of this mask should be
updated if new mask is added later.

Along with this change, rate_limit is supported by the extended command,
user driver could use it to control packet packing.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/core: Support rate limit for packet pacing
Bodong Wang [Thu, 1 Dec 2016 11:43:14 +0000 (13:43 +0200)]
IB/core: Support rate limit for packet pacing

Add new member rate_limit to ib_qp_attr which holds the packet pacing rate
in kbps, 0 means unlimited.

IB_QP_RATE_LIMIT is added to ib_attr_mask and could be used by RAW
QPs when changing QP state from RTR to RTS, RTS to RTS.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/mlx5: Report mlx5 packet pacing capabilities when querying device
Bodong Wang [Thu, 1 Dec 2016 11:43:13 +0000 (13:43 +0200)]
IB/mlx5: Report mlx5 packet pacing capabilities when querying device

Enable mlx5 based hardware to report packet pacing capabilities
from kernel to user space. Packet pacing allows to limit the rate to any
number between the maximum and minimum, based on user settings.

The capabilities are exposed to user space through query_device by uhw.
The following capabilities are reported:

1. The maximum and minimum rate limit in kbps supported by packet pacing.
2. Bitmap showing which QP types are supported by packet pacing operation.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/mlx5: Support RAW Ethernet when RoCE is disabled
Or Gerlitz [Sun, 27 Nov 2016 14:51:36 +0000 (16:51 +0200)]
IB/mlx5: Support RAW Ethernet when RoCE is disabled

On some environments, such as certain SRIOV VF configurations, RoCE is
not supported for mlx5 Ethernet ports. Currently, the driver will not
open IB device on that port.

This is problematic, since we do want user-space RAW Ethernet (RAW_PACKET
QPs) functionality to remain in place. For that end, enhance the relevant
driver flows such that we do create a device instance in that case.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/mlx5: Rename RoCE related helpers to reflect being Eth ones
Or Gerlitz [Sun, 27 Nov 2016 14:51:35 +0000 (16:51 +0200)]
IB/mlx5: Rename RoCE related helpers to reflect being Eth ones

This is a pre-step towards having mlx5 IB device also over Eth ports where
RoCE is not supported. We change the roce enable/disable and roce_lag
init/fini function names to have _eth instead of _roce.

This patch doesn't change any functionality.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/mlx5: Refactor registration to netdev notifier
Or Gerlitz [Sun, 27 Nov 2016 14:51:34 +0000 (16:51 +0200)]
IB/mlx5: Refactor registration to netdev notifier

Refactor the netdev notifier registration into a small helper function.

This is a pre-step towards having mlx5 IB device over an Ethernet port
which doesn't support RoCE. Also, renamed the de-registration helper
and the new helper as netdev notifier and not roce, to make it clear
this is not only used with roce.

This patch doesn't change any functionality.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/mlx5: Use u64 for UMR length
Maor Gottlieb [Sun, 27 Nov 2016 13:18:22 +0000 (15:18 +0200)]
IB/mlx5: Use u64 for UMR length

The fast_registration length is used to convey length for memory
registrations through UMR which can be of any size up to 2^64.

Change the length type to be u64.

Fixes: 968e78dd9644 ('IB/mlx5: Enhance UMR support to allow partial page table update')
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/mlx5: Avoid system crash when enabling many VFs
Eli Cohen [Sun, 27 Nov 2016 13:18:21 +0000 (15:18 +0200)]
IB/mlx5: Avoid system crash when enabling many VFs

When enabling many VFs, the total amount of DMA mappings increase
significantly. This causes DMA allocations to take a lot of time
since they are serialized in the kernel.

As a result the driver enters into fatal condition due to
timeout and the system hangs. To recover from this we disable
MR cache for VFs.

PFs will still have a full cache and VFs cache can be manipulated
as usual after driver load.

Fixes: e126ba97dba9 ('mlx5: Add driver for Mellanox Connect-IB adapters')
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/mlx5: Assign SRQ type earlier
Maor Gottlieb [Sun, 27 Nov 2016 13:18:20 +0000 (15:18 +0200)]
IB/mlx5: Assign SRQ type earlier

Move the SRQ type assignment to be before actually using it
in create_srq_user() and in create_srq_kernel() functions.

Fixes: af1ba291c5e4 ('{net, IB}/mlx5: Refactor internal SRQ API')
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/mlx4: Fix out-of-range array index in destroy qp flow
Jack Morgenstein [Sun, 27 Nov 2016 13:18:19 +0000 (15:18 +0200)]
IB/mlx4: Fix out-of-range array index in destroy qp flow

For non-special QPs, the port value becomes non-zero only at the
RESET-to-INIT transition. If the QP has not undergone that transition,
its port number value is still zero.

If such a QP is destroyed before being moved out of the RESET state,
subtracting one from the qp port number results in a negative value.
Using that negative value as an index into the qp1_proxy array
results in an out-of-bounds array reference.

Fix this by testing that the QP type is one that uses qp1_proxy before
using the port number. For special QPs of all types, the port number is
specified at QP creation time.

Fixes: 9433c188915c ("IB/mlx4: Invoke UPDATE_QP for proxy QP1 on MAC changes")
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/mlx5: Make create/destroy_ah available to userspace
Moni Shoua [Wed, 23 Nov 2016 06:23:26 +0000 (08:23 +0200)]
IB/mlx5: Make create/destroy_ah available to userspace

Advertise that create_ah and destroy_ah verbs are accessible from
uverbs interface.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/mlx5: Use kernel driver to help userspace create ah
Moni Shoua [Wed, 23 Nov 2016 06:23:25 +0000 (08:23 +0200)]
IB/mlx5: Use kernel driver to help userspace create ah

Resolving a MAC address for a given IP address in userspace is inefficient.
This patch lets mlx5 user driver using the kernel driver to resolve the mac
and get the answer in the private section of the response.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/core: Let create_ah return extended response to user
Moni Shoua [Wed, 23 Nov 2016 06:23:24 +0000 (08:23 +0200)]
IB/core: Let create_ah return extended response to user

Add struct ib_udata to the signature of create_ah callback that is
implemented by IB device drivers. This allows HW drivers to return extra
data to the userspace library.
This patch prepares the ground for mlx5 driver to resolve destination
mac address for a given GID and return it to userspace.
This patch was previously submitted by Knut Omang as a part of the
patch set to support Oracle's Infiniband HCA (SIF).

Signed-off-by: Knut Omang <knut.omang@oracle.com>
Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/mlx5: Report that device has udata response in create_ah
Moni Shoua [Wed, 23 Nov 2016 06:23:23 +0000 (08:23 +0200)]
IB/mlx5: Report that device has udata response in create_ah

To make mlx5 user driver aware of whether kernel driver returns dmac
in user data response add a new flag that will be returned back to
user-space through alloc_ucontext.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/core: Change ib_resolve_eth_dmac to use it in create AH
Moni Shoua [Wed, 23 Nov 2016 06:23:22 +0000 (08:23 +0200)]
IB/core: Change ib_resolve_eth_dmac to use it in create AH

The function ib_resolve_eth_dmac() requires struct qp_attr * and
qp_attr_mask as parameters while the function might be useful to resolve
dmac for address handles. This patch changes the signature of the
function so it can be used in the flow of creating an address handle.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/mlx5: Add support to match inner packet fields
Moses Reuben [Mon, 14 Nov 2016 17:04:52 +0000 (19:04 +0200)]
IB/mlx5: Add support to match inner packet fields

Add support to match packet fields which are tunneled,
i.e. support matching the header of the inner packet which is the result of
or bit operation of the original header and the IB_FLOW_SPEC_INNER type.

The combination of IB_FLOW_SPEC_INNER | IB_FLOW_SPEC_VXLAN_TUNNEL is not
needed to be checked, because the IB core has this check already.

Signed-off-by: Moses Reuben <mosesr@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/core: Introduce inner flow steering
Moses Reuben [Mon, 14 Nov 2016 17:04:51 +0000 (19:04 +0200)]
IB/core: Introduce inner flow steering

For a tunneled packet which contains external and internal headers,
we refer to the external headers as "outer fields" and the internal
headers as "inner fields".

Example of a tunneled packet:

{ L2 | L3 | L4 | tunnel header | L2 | L3 | l4 | data }
  |     |    |         |         |    |    |
{       outer fields           }{ inner fields }

This patch introduces a new flag for flow steering rules
- IB_FLOW_SPEC_INNER - which specifies that the rule applies
to the inner fields, rather than to the outer fields of the packet.

Signed-off-by: Moses Reuben <mosesr@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/mlx5: Support Vxlan tunneling specification
Moses Reuben [Mon, 14 Nov 2016 17:04:50 +0000 (19:04 +0200)]
IB/mlx5: Support Vxlan tunneling specification

Add support to receive specific Vxlan packet in ConnectX-4.

Signed-off-by: Moses Reuben <mosesr@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/uverbs: Add support for Vxlan protocol
Moses Reuben [Mon, 14 Nov 2016 17:04:49 +0000 (19:04 +0200)]
IB/uverbs: Add support for Vxlan protocol

Add ib_uverbs_flow_spec_tunnel to define the rule to match Vxlan,
the type, size, reserved fields are identical to rest of the protocols,
and are used to identify the spec.
The tunnel id is the vni value of the Vxlan protocol, and it is used
as part of the steering rule, it is limited by the mask.
The steering rule configured on the hardware does a match
according to vni and other protocols.
In the same way as rest of the protocols that we match,
the uniq field's of each protocol are represented on
the val and the mask.

Signed-off-by: Moses Reuben <mosesr@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/core: Align structure ib_flow_spec_type
Moses Reuben [Mon, 14 Nov 2016 17:04:48 +0000 (19:04 +0200)]
IB/core: Align structure ib_flow_spec_type

Aligned the structure ib_flow_spec_type indentation,
after adding a new definition.

Signed-off-by: Moses Reuben <mosesr@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/core: Add flow spec tunneling support
Moses Reuben [Mon, 14 Nov 2016 17:04:47 +0000 (19:04 +0200)]
IB/core: Add flow spec tunneling support

In order to support tunneling, that can be used by the QP,
both struct ib_flow_spec_tunnel and struct ib_flow_tunnel_filter can be
used to more IP or UDP based tunneling protocols (e.g NVGRE, GRE, etc).

IB_FLOW_SPEC_VXLAN_TUNNEL type flow specification is added to use this
functionality and match specific Vxlan packets.

In similar to IPv6, we check overflow of the vni value by
comparing with the maximum size.

Signed-off-by: Moses Reuben <mosesr@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/mlx5: Add support for CQE compressing
Bodong Wang [Mon, 31 Oct 2016 10:16:45 +0000 (12:16 +0200)]
IB/mlx5: Add support for CQE compressing

CQE compressing reduces PCI overhead by coalescing and compressing
multiple CQEs into a single merged CQE. Successful compressing
improves message rate especially for small packet traffic.

CQE compressing is supported for all 64B CQE formats (with certain
limitations) generated by RQ/Responder or by SQ/Requestor.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/mlx5: Report mlx5 CQE compression caps during query
Bodong Wang [Mon, 31 Oct 2016 10:16:44 +0000 (12:16 +0200)]
IB/mlx5: Report mlx5 CQE compression caps during query

The capabilities include:
- Max number of compressed and aggregated CQEs in a single session,
  while zero means unsupported.
- For Responder, there are two formats of mini CQE: mini CQE with Rx
  hash and mini CQE with checksum. They're mutual exclusive.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/mlx5: Report mlx5 multi packet WQE caps during query
Bodong Wang [Mon, 31 Oct 2016 10:15:21 +0000 (12:15 +0200)]
IB/mlx5: Report mlx5 multi packet WQE caps during query

The capabilities whether hardware support multi packet WQE or not is
exposed to user space through query_device by uhw.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agonet/mlx5: Report multi packet WQE capabilities
Leon Romanovsky [Mon, 31 Oct 2016 10:15:20 +0000 (12:15 +0200)]
net/mlx5: Report multi packet WQE capabilities

Multi packet WQE enables sending multiple fix sized packets
using a single WQE. The exposed field reports such HW support.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/rxe: Increase max number of completions to 32k
Yonatan Cohen [Wed, 16 Nov 2016 08:39:16 +0000 (10:39 +0200)]
IB/rxe: Increase max number of completions to 32k

Increase limit of max CQE from 8K to 32K to allow demanding
applications to work over SoftRoCE with same configuration
as most RoCEv2 HW vendors have.

Fixes: 8700e3e7c485 ("Soft RoCE driver")
Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com>
Reviewed-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/mlx4: Check if GRH is available before using it
Eran Ben Elisha [Thu, 10 Nov 2016 09:31:01 +0000 (11:31 +0200)]
IB/mlx4: Check if GRH is available before using it

Before reading GRH attributes, need to make sure AH contains GRH,
and in addition, initialize GID type.

Fixes: dbf727de7440 ('IB/core: Use GID table in AH creation and dmac resolution')
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/mlx4: When no DMFS for IPoIB, don't allow NET_IF QPs
Eran Ben Elisha [Thu, 10 Nov 2016 09:31:00 +0000 (11:31 +0200)]
IB/mlx4: When no DMFS for IPoIB, don't allow NET_IF QPs

According to the firmware spec, FLOW_STEERING_IB_UC_QP_RANGE command is
supported only if dmfs_ipoib bit is set.

If it isn't set we want to ensure allocating NET_IF QPs fail. We do so
by filling out the allocation bitmap. By thus, the NET_IF QPs allocating
function won't find any free QP and will fail.

Fixes: c1c98501121e ('IB/mlx4: Add support for steerable IB UD QPs')
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoi40iw: Reorganize structures to align with HW capabilities
Henry Orosco [Tue, 6 Dec 2016 22:16:20 +0000 (16:16 -0600)]
i40iw: Reorganize structures to align with HW capabilities

Some resources are incorrectly organized and at odds with
HW capabilities. Specifically, ILQ, IEQ, QPs, MSS, QOS
and statistics belong in a VSI.

Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Henry Orosco <henry.orosco@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoi40iw: Fix incorrect check for error
Mustafa Ismail [Tue, 6 Dec 2016 21:49:35 +0000 (15:49 -0600)]
i40iw: Fix incorrect check for error

In i40iw_ieq_handle_partial() the check for !status is incorrect.

Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Henry Orosco <henry.orosco@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoi40iw: Assign MSS only when it is a new MTU
Mustafa Ismail [Tue, 6 Dec 2016 21:49:34 +0000 (15:49 -0600)]
i40iw: Assign MSS only when it is a new MTU

Currently we are changing the MSS regardless of whether
there is a change or not in MTU. Fix to make the
assignment of MSS dependent on an MTU change.

Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Henry Orosco <henry.orosco@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoi40iw: Fix race condition in terminate timer's handler
Shiraz Saleem [Tue, 6 Dec 2016 21:49:33 +0000 (15:49 -0600)]
i40iw: Fix race condition in terminate timer's handler

Add a QP reference when terminate timer is started to ensure
the destroy QP doesn't race ahead to free the QP while it is being
referenced in the terminate timer's handler.

Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoi40iw: Fix memory leak in CQP destroy when in reset
Mustafa Ismail [Tue, 6 Dec 2016 21:49:32 +0000 (15:49 -0600)]
i40iw: Fix memory leak in CQP destroy when in reset

On a device close, the control QP (CQP) is destroyed by calling
cqp_destroy which destroys the CQP and frees its SD buffer memory.
However, if the reset flag is true, cqp_destroy is never called and
leads to a memory leak on SD buffer memory. Fix this by always calling
cqp_destroy, on device close, regardless of reset. The exception to this
when CQP create fails. In this case, the SD buffer memory is already
freed on an error check and there is no need to call cqp_destroy.

Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoi40iw: Fix QP flush to not hang on empty queues or failure
Shiraz Saleem [Tue, 6 Dec 2016 21:49:31 +0000 (15:49 -0600)]
i40iw: Fix QP flush to not hang on empty queues or failure

When flush QP and there are no pending work requests, signal completion
to unblock i40iw_drain_sq and i40iw_drain_rq which are waiting on
completion for iwqp->sq_drained and iwqp->sq_drained respectively.
Also, signal completion if flush QP fails to prevent the drain SQ or RQ
from being blocked indefintely.

Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoi40iw: Fix double free of QP
Mustafa Ismail [Tue, 6 Dec 2016 21:49:30 +0000 (15:49 -0600)]
i40iw: Fix double free of QP

A QP can be double freed if i40iw_cm_disconn() is
called while it is currently being freed by
i40iw_rem_ref(). The fix in i40iw_cm_disconn() will
first check if the QP is already freed before
making another request for the QP to be freed.

Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Henry Orosco <henry.orosco@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoi40iw: Use correct src address in memcpy to rdma stats counters
Shiraz Saleem [Fri, 11 Nov 2016 16:55:41 +0000 (10:55 -0600)]
i40iw: Use correct src address in memcpy to rdma stats counters

hw_stats is a pointer to i40_iw_dev_stats struct in i40iw_get_hw_stats().
Use hw_stats and not &hw_stats in the memcpy to copy the i40iw device stats
data into rdma_hw_stats counters.

Fixes: b40f4757daa1 ("IB/core: Make device counter infrastructure dynamic")

Cc: stable@vger.kernel.org # 4.7+
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoi40iw: Remove macros I40IW_STAG_KEY_FROM_STAG and I40IW_STAG_INDEX_FROM_STAG
Thomas Huth [Wed, 5 Oct 2016 11:55:38 +0000 (13:55 +0200)]
i40iw: Remove macros I40IW_STAG_KEY_FROM_STAG and I40IW_STAG_INDEX_FROM_STAG

The macros I40IW_STAG_KEY_FROM_STAG and I40IW_STAG_INDEX_FROM_STAG are
apparently bad - they are using the logical "&&" operation which
does not make sense here. It should have been a bitwise "&" instead.
Since the macros seem to be completely unused, let's simply remove
them so that nobody accidentially uses them in the future. And while
we're at it, also remove the unused macro I40IW_CREATE_STAG.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Acked-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/rxe: Hold refs when running tasklets
Andrew Boyer [Mon, 5 Dec 2016 13:43:21 +0000 (08:43 -0500)]
IB/rxe: Hold refs when running tasklets

It might be possible for all of a QP's references to be dropped
while one of that QP's tasklets is running.

For example, the completer might run during QP destroy.
If qp->valid is false, it will drop all of the packets on
the resp_pkts list, potentially removing the last reference.
Then it tries to advance the SQ consumer pointer. If the
SQ's buffer has already been destroyed, the system will
panic.

To be safe, hold a reference on the QP for the duration
of each tasklet.

Signed-off-by: Andrew Boyer <andrew.boyer@dell.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/rxe: Wait for tasklets to finish before tearing down QP
Andrew Boyer [Mon, 5 Dec 2016 13:43:20 +0000 (08:43 -0500)]
IB/rxe: Wait for tasklets to finish before tearing down QP

The system may crash when a malformed request is received and
the error is detected by the responder.

NodeA: $ ibv_rc_pingpong -g 0 -d rxe0 -i 1 -n 1 -s 50000
NodeB: $ ibv_rc_pingpong -g 0 -d rxe0 -i 1 -n 1 -s 1024 <NodeA_ip>

The responder generates a receive error on node B since the incoming
SEND is oversized. If the client tears down the QP before the responder
or the completer finish running, a page fault may occur.

The fix makes the destroy operation spin until the tasks complete, which
appears to be original intent of the design.

Signed-off-by: Andrew Boyer <andrew.boyer@dell.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/rxe: Fix ref leak in duplicate_request()
Andrew Boyer [Wed, 23 Nov 2016 17:39:24 +0000 (12:39 -0500)]
IB/rxe: Fix ref leak in duplicate_request()

A ref was added after the call to skb_clone().

Signed-off-by: Andrew Boyer <andrew.boyer@dell.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/rxe: Fix ref leak in rxe_create_qp()
Andrew Boyer [Wed, 23 Nov 2016 17:39:23 +0000 (12:39 -0500)]
IB/rxe: Fix ref leak in rxe_create_qp()

The udata->inlen error path needs to clean up the ref
added by rxe_alloc().

Signed-off-by: Andrew Boyer <andrew.boyer@dell.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/rxe: Add support for IB_CQ_REPORT_MISSED_EVENTS
Andrew Boyer [Wed, 23 Nov 2016 17:39:22 +0000 (12:39 -0500)]
IB/rxe: Add support for IB_CQ_REPORT_MISSED_EVENTS

Peek at the CQ after arming it so that we can return a hint.
This avoids missed completions due to a race between posting
CQEs and arming the CQ.

For example, CM teardown waits on MAD requests to complete with
ib_cq_poll_work(). Without this fix, the last completion might be
left on the CQ, hanging the kthread doing the teardown.

The console backtraces look like this:

[ 4199.911284] Call Trace:
[ 4199.911401]  [<ffffffff9657fe95>] schedule+0x35/0x80
[ 4199.911556]  [<ffffffff965830df>] schedule_timeout+0x22f/0x2c0
[ 4199.911727]  [<ffffffff9657f7a8>] ? __schedule+0x368/0xa20
[ 4199.911891]  [<ffffffff96580903>] wait_for_completion+0xb3/0x130
[ 4199.912067]  [<ffffffff960a17e0>] ? wake_up_q+0x70/0x70
[ 4199.912243]  [<ffffffffc074a06d>] cm_destroy_id+0x13d/0x450 [ib_cm]
[ 4199.912422]  [<ffffffff961615d5>] ? printk+0x57/0x73
[ 4199.912578]  [<ffffffffc074a390>] ib_destroy_cm_id+0x10/0x20 [ib_cm]
[ 4199.912759]  [<ffffffffc076098c>] rdma_destroy_id+0xac/0x340 [rdma_cm]
[ 4199.912941]  [<ffffffffc076f2cc>] 0xffffffffc076f2cc

Signed-off-by: Andrew Boyer <andrew.boyer@dell.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/rxe: Add support for zero-byte operations
Andrew Boyer [Wed, 23 Nov 2016 17:39:21 +0000 (12:39 -0500)]
IB/rxe: Add support for zero-byte operations

The last_psn algorithm fails in the zero-byte case: it calculates
first_psn = N, last_psn = N-1. This makes the operation unretryable since
the res structure will fail the (first_psn <= psn <= last_psn) test in
find_resource().

While here, use BTH_PSN_MASK to mask the calculated last_psn.

Signed-off-by: Andrew Boyer <andrew.boyer@dell.com>
Reviewed-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/rxe: Unblock loopback by moving skb_out increment
Andrew Boyer [Wed, 23 Nov 2016 17:39:20 +0000 (12:39 -0500)]
IB/rxe: Unblock loopback by moving skb_out increment

skb_out is decremented in rxe_skb_tx_dtor(), which is not called in the
loopback() path. Move the increment to the send() path rather than
rxe_xmit_packet().

Signed-off-by: Andrew Boyer <andrew.boyer@dell.com>
Acked-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/rxe: Don't update the response PSN unless it's going forwards
Andrew Boyer [Wed, 23 Nov 2016 17:39:19 +0000 (12:39 -0500)]
IB/rxe: Don't update the response PSN unless it's going forwards

A client might post a read followed by a send. The partner receives
and acknowledges both transactions, posting an RCQ entry for the
send, but something goes wrong with the read ACK. When the client
retries the read, the partner's responder processes the duplicate
read but incorrectly resets the PSN to the value preceding the
original send. When the duplicate send arrives, the responder cannot
tell that it is a duplicate, so the responder generates a duplicate
RCQ entry, confusing the client.

Signed-off-by: Andrew Boyer <andrew.boyer@dell.com>
Reviewed-by: Yonatan Cohen <yonatanc@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/rxe: Advance the consumer pointer before posting the CQE
Andrew Boyer [Wed, 23 Nov 2016 17:39:18 +0000 (12:39 -0500)]
IB/rxe: Advance the consumer pointer before posting the CQE

A simple userspace application might poll the CQ, find a completion,
and then attempt to post a new WQE to the SQ. A spurious error can
occur if the userspace application detects a full SQ in the instant
before the kernel is able to advance the SQ consumer pointer.

This is noticeable when using single-entry SQs with ibv_rc_pingpong
if lots of kernel and userspace library debugging is enabled.

Signed-off-by: Andrew Boyer <andrew.boyer@dell.com>
Reviewed-by: Yonatan Cohen <yonatanc@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>