Chuck Lever [Thu, 7 Jan 2016 19:49:20 +0000 (14:49 -0500)]
svcrdma: Improve allocation of struct svc_rdma_req_map
To ensure this allocation cannot fail and will not sleep,
pre-allocate the req_map structures per-connection.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Acked-by: Bruce Fields <bfields@fieldses.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Chuck Lever [Thu, 7 Jan 2016 19:49:12 +0000 (14:49 -0500)]
svcrdma: Improve allocation of struct svc_rdma_op_ctxt
When the maximum payload size of NFS READ and WRITE was increased
by commit
cc9a903d915c ("svcrdma: Change maximum server payload back
to RPCSVC_MAXPAYLOAD"), the size of struct svc_rdma_op_ctxt
increased to over 6KB (on x86_64). That makes allocating one of
these from a kmem_cache more likely to fail in situations when
system memory is exhausted.
Since I'm about to add a caller where this allocation must always
work _and_ it cannot sleep, pre-allocate ctxts for each connection.
Another motivation for this change is that NFSv4.x servers are
required by specification not to drop NFS requests. Pre-allocating
memory resources reduces the likelihood of a drop.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Acked-by: Bruce Fields <bfields@fieldses.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Chuck Lever [Thu, 7 Jan 2016 19:49:03 +0000 (14:49 -0500)]
svcrdma: Clean up process_context()
Be sure the completed ctxt is put in every path.
The xprt enqueue can take a while, so put the completed ctxt back
in circulation _before_ enqueuing the xprt.
Remove/disable debugging.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Acked-by: Bruce Fields <bfields@fieldses.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Chuck Lever [Thu, 7 Jan 2016 19:48:55 +0000 (14:48 -0500)]
svcrdma: Clean up rdma_create_xprt()
kzalloc is used here, so setting the atomic fields to zero is
unnecessary. sc_ord is set again in handle_connect_req. The other
fields are re-initialized in svc_rdma_accept().
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Acked-by: Bruce Fields <bfields@fieldses.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Matan Barak [Mon, 4 Jan 2016 08:49:54 +0000 (10:49 +0200)]
IB/core: Use hop-limit from IP stack for RoCE
Previously, IPV6_DEFAULT_HOPLIMIT was used as the hop limit value for
RoCE. Fixing that by taking ip4_dst_hoplimit and ip6_dst_hoplimit as
hop limit values.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Matan Barak [Mon, 4 Jan 2016 08:49:53 +0000 (10:49 +0200)]
IB/core: Rename rdma_addr_find_dmac_by_grh
rdma_addr_find_dmac_by_grh resolves dmac, vlan_id and if_index and
downsteram patch will also add hop_limit as an output parameter,
thus we rename it to rdma_addr_find_l2_eth_by_grh.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Bart Van Assche [Fri, 1 Jan 2016 12:17:46 +0000 (13:17 +0100)]
IB/cm: Fix a recently introduced deadlock
ib_send_cm_drep() calls cm_enter_timewait() while holding a spinlock
that can be locked from inside an interrupt handler. Hence do not
enable interrupts inside cm_enter_timewait() if called with interrupts
disabled.
This patch fixes e.g. the following deadlock:
Acked-by: Erez Shitrit <erezsh@mellanox.com>
=================================
[ INFO: inconsistent lock state ]
4.4.0-rc7+ #1 Tainted: G E
---------------------------------
inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
swapper/8/0 [HC1[1]:SC0[0]:HE0:SE1] takes:
(&(&cm_id_priv->lock)->rlock){?.+...}, at: [<
ffffffffa036eec4>] cm_establish+0x
74/0x1b0 [ib_cm]
{HARDIRQ-ON-W} state was registered at:
[<
ffffffff810a3c11>] mark_held_locks+0x71/0x90
[<
ffffffff810a3e87>] trace_hardirqs_on_caller+0xa7/0x1c0
[<
ffffffff810a3fad>] trace_hardirqs_on+0xd/0x10
[<
ffffffff8151c40b>] _raw_spin_unlock_irq+0x2b/0x40
[<
ffffffffa036ea8e>] cm_enter_timewait+0xae/0x100 [ib_cm]
[<
ffffffffa036ff76>] ib_send_cm_drep+0xb6/0x190 [ib_cm]
[<
ffffffffa052ed08>] srp_cm_handler+0x128/0x1a0 [ib_srp]
[<
ffffffffa0370340>] cm_process_work+0x20/0xf0 [ib_cm]
[<
ffffffffa0371335>] cm_dreq_handler+0x135/0x2c0 [ib_cm]
[<
ffffffffa03733c5>] cm_work_handler+0x75/0xd0 [ib_cm]
[<
ffffffff8107184d>] process_one_work+0x1bd/0x460
[<
ffffffff81073148>] worker_thread+0x118/0x420
[<
ffffffff81078454>] kthread+0xe4/0x100
[<
ffffffff8151cbbf>] ret_from_fork+0x3f/0x70
irq event stamp:
1672286
hardirqs last enabled at (
1672283): [<
ffffffff81408ec0>] poll_idle+0x10/0x80
hardirqs last disabled at (
1672284): [<
ffffffff8151d304>] common_interrupt+0x84/0x89
softirqs last enabled at (
1672286): [<
ffffffff8105b4dc>] _local_bh_enable+0x1c/0x50
softirqs last disabled at (
1672285): [<
ffffffff8105b697>] irq_enter+0x47/0x70
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&(&cm_id_priv->lock)->rlock);
<Interrupt>
lock(&(&cm_id_priv->lock)->rlock);
*** DEADLOCK ***
no locks held by swapper/8/0.
stack backtrace:
CPU: 8 PID: 0 Comm: swapper/8 Tainted: G E 4.4.0-rc7+ #1
Hardware name: Dell Inc. PowerEdge R430/03XKDV, BIOS 1.0.2 11/17/2014
ffff88045af5e950 ffff88046e503a88 ffffffff81251c1b 0000000000000007
0000000000000006 0000000000000003 ffff88045af5ddc0 ffff88046e503ad8
ffffffff810a32f4 0000000000000000 0000000000000000 0000000000000001
Call Trace:
<IRQ> [<
ffffffff81251c1b>] dump_stack+0x4f/0x74
[<
ffffffff810a32f4>] print_usage_bug+0x184/0x190
[<
ffffffff810a36e2>] mark_lock_irq+0xf2/0x290
[<
ffffffff810a3995>] mark_lock+0x115/0x1b0
[<
ffffffff810a3b8c>] mark_irqflags+0x15c/0x170
[<
ffffffff810a4fef>] __lock_acquire+0x1ef/0x560
[<
ffffffff810a53c2>] lock_acquire+0x62/0x80
[<
ffffffff8151bd33>] _raw_spin_lock_irqsave+0x43/0x60
[<
ffffffffa036eec4>] cm_establish+0x74/0x1b0 [ib_cm]
[<
ffffffffa036f031>] ib_cm_notify+0x31/0x100 [ib_cm]
[<
ffffffffa0637f24>] srpt_qp_event+0x54/0xd0 [ib_srpt]
[<
ffffffffa0196052>] mlx4_ib_qp_event+0x72/0xc0 [mlx4_ib]
[<
ffffffffa00775b9>] mlx4_qp_event+0x69/0xd0 [mlx4_core]
[<
ffffffffa006000e>] mlx4_eq_int+0x51e/0xd50 [mlx4_core]
[<
ffffffffa006084f>] mlx4_msi_x_interrupt+0xf/0x20 [mlx4_core]
[<
ffffffff810b67b0>] handle_irq_event_percpu+0x40/0x110
[<
ffffffff810b68bf>] handle_irq_event+0x3f/0x70
[<
ffffffff810ba7f9>] handle_edge_irq+0x79/0x120
[<
ffffffff81007f3d>] handle_irq+0x5d/0x130
[<
ffffffff810071fd>] do_IRQ+0x6d/0x130
[<
ffffffff8151d309>] common_interrupt+0x89/0x89
<EOI> [<
ffffffff8140895f>] cpuidle_enter_state+0xcf/0x200
[<
ffffffff81408aa2>] cpuidle_enter+0x12/0x20
[<
ffffffff810990d6>] call_cpuidle+0x36/0x60
[<
ffffffff81099163>] cpuidle_idle_call+0x63/0x110
[<
ffffffff8109930a>] cpu_idle_loop+0xfa/0x130
[<
ffffffff8109934e>] cpu_startup_entry+0xe/0x10
[<
ffffffff8103c443>] start_secondary+0x83/0x90
Fixes: commit
be4b499323bf ("IB/cm: Do not queue work to a device that's going away")
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Erez Shitrit <erezsh@mellanox.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Bart Van Assche [Thu, 31 Dec 2015 08:56:43 +0000 (09:56 +0100)]
IB/srpt: Fix the RDMA completion handlers
Avoid that the following kernel crash is triggered when processing
an RDMA completion:
BUG: unable to handle kernel paging request at
0000000100000198
IP: [<
ffffffff810a4ea2>] __lock_acquire+0xa2/0x560
Call Trace:
[<
ffffffff810a53c2>] lock_acquire+0x62/0x80
[<
ffffffff8151bd33>] _raw_spin_lock_irqsave+0x43/0x60
[<
ffffffffa04fd437>] srpt_rdma_read_done+0x57/0x120 [ib_srpt]
[<
ffffffffa0144dd3>] __ib_process_cq+0x43/0xc0 [ib_core]
[<
ffffffffa0145115>] ib_cq_poll_work+0x25/0x70 [ib_core]
[<
ffffffff8107184d>] process_one_work+0x1bd/0x460
[<
ffffffff81073148>] worker_thread+0x118/0x420
[<
ffffffff81078454>] kthread+0xe4/0x100
[<
ffffffff8151cbbf>] ret_from_fork+0x3f/0x70
Fixes: commit
59fae4deaad3 ("IB/srpt: chain RDMA READ/WRITE requests").
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Bart Van Assche [Thu, 31 Dec 2015 08:56:03 +0000 (09:56 +0100)]
irq_poll: Fix irq_poll_sched()
The IRQ_POLL_F_SCHED bit is set as long as polling is ongoing.
This means that irq_poll_sched() must proceed if this bit has
not yet been set.
Fixes: commit
ea51190c0315 ("irq_poll: fold irq_poll_sched_prep into irq_poll_sched").
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Matan Barak [Wed, 30 Dec 2015 14:14:18 +0000 (16:14 +0200)]
IB/core: Fix dereference before check
Sparse complains about dereference before check. Fixing this by
moving the check before the dereference.
Fixes:
200298326b27 ('IB/core: Validate route when we init ah')
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Matan Barak [Wed, 30 Dec 2015 14:14:17 +0000 (16:14 +0200)]
IB/core: Eliminate sparse false context imbalance warning
When write_gid function needs to do a sleep-able operation, it unlocks
table->rwlock and then relocks it. Sparse complains about context
imbalance.
This is safe as write_gid is always called with table->rwlock.
write_gid protects from simultaneous writes to this GID entry
by setting the GID_TABLE_ENTRY_INVALID flag.
Fixes:
9c584f049596 ('IB/core: Change per-entry lock in RoCE GID table to
one lock')
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Hal Rosenstock [Tue, 29 Dec 2015 10:43:43 +0000 (05:43 -0500)]
IB/core: sysfs.c: Fix PerfMgt ClassPortInfo handling
Port number is not part of ClassPortInfo attribute but is
still needed as a parameter when invoking process_mad.
To properly handle this attribute, port_num is added as a
parameter to get_counter_table and get_perf_mad was changed
not to store port_num in the attribute itself when it's
querying the ClassPortInfo attribute.
This handles issue pointed out by Matan Barak <matanb@dev.mellanox.co.il>
Fixes:
145d9c541032 ('IB/core: Display extended counter set if available')
Signed-off-by: Hal Rosenstock <hal@mellanox.com>
Acked-by: Matan Barak <matanb@mellanox.com>
Acked-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Bart Van Assche [Tue, 29 Dec 2015 09:45:03 +0000 (10:45 +0100)]
IB/core: Remove set-but-not-used variable from ib_sg_to_pages()
Detected this by building the IB core with W=1. See also patch
"IB core: Fix ib_sg_to_pages()" (commit
8f5ba10ed40a).
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Sagi Grimberg <sagig@mellanox.com>
Cc: Christoph Hellwig <hch@lst.de>
Reviewed-by: Leon Romanovsky <leon.romanovsky@mellanox.com>
Acked-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Leon Romanovsky [Sat, 9 Jan 2016 11:06:25 +0000 (13:06 +0200)]
IB/mlx5: Fix passing casted pointer in mlx5_query_port_roce
Fix static checker warning:
drivers/infiniband/hw/mlx5/main.c:149 mlx5_query_port_roce()
warn: passing casted pointer '&props->qkey_viol_cntr' to
'mlx5_query_nic_vport_qkey_viol_cntr()' 32 vs 16.
Fixes:
3f89a643eb29 ("IB/mlx5: Extend query_device/port to support RoCE")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Christoph Hellwig [Wed, 6 Jan 2016 06:46:12 +0000 (22:46 -0800)]
IB/mad: use CQ abstraction
Remove the local workqueue to process mad completions and use the CQ API
instead.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hal Rosenstock <hal@mellanox.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Christoph Hellwig [Mon, 4 Jan 2016 13:15:58 +0000 (14:15 +0100)]
IB/mad: pass ib_mad_send_buf explicitly to the recv_handler
Stop abusing wr_id and just pass the parameter explicitly.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hal Rosenstock <hal@mellanox.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Lucas Tanure [Tue, 19 Jan 2016 14:06:30 +0000 (12:06 -0200)]
infiniband: Replace memset with eth_zero_addr
Use eth_zero_addr to assign the zero address to the given address
array instead of memset when second argument is address of zero.
Signed-off-by: Lucas Tanure <tanure@linux.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Leon Romanovsky [Tue, 19 Jan 2016 09:11:24 +0000 (11:11 +0200)]
IB/mlx5: Delete locally redefined variable
Fix the following sparse warning:
drivers/infiniband/hw/mlx5/main.c:1061:29: warning: symbol 'pfn' shadows
an earlier one
drivers/infiniband/hw/mlx5/main.c:1030:21: originally declared here
Fixes:
d69e3bcf7976 ('IB/mlx5: Mmap the HCA's core clock register to user-space')
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Moni Shoua [Thu, 14 Jan 2016 15:48:07 +0000 (17:48 +0200)]
net/mlx4: Remove unused macro
The macro mlx4_foreach_non_ib_transport_port() is not used anywhere. Remove it.
Fixes:
aa9a2d51a3e7 ("mlx4: Activate RoCE/SRIOV")
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Moni Shoua [Thu, 14 Jan 2016 15:47:38 +0000 (17:47 +0200)]
IB/mlx4: Take source mac from AH instead from the port
In commit
dbf727de7440 ("IB/core: Use GID table in AH creation and dmac
resolution") we copy source mac to mlx4_ah from the attributes of
gid at ib_ah_attr.grh.sgid_index. Now we can use it.
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Matan Barak [Thu, 14 Jan 2016 15:47:02 +0000 (17:47 +0200)]
IB/mlx4: Initialize hop_limit when creating address handle
Hop limit value wasn't copied from attributes when ah was created.
This may influence packets for unconnected services to get dropped in
routers when endpoints are not in the same subnet.
Fixes:
fa417f7b520e ("IB/mlx4: Add support for IBoE")
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Leon Romanovsky [Thu, 14 Jan 2016 06:11:40 +0000 (08:11 +0200)]
IB/mlx5: Expose correct maximum number of CQE capacity
Maximum number of EQE capacity per CQ was mistakenly exposed
as CQE. Fix that.
Fixes:
938fe83c8dcb ("net/mlx5_core: New device capabilities handling")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Cc: <stable@vger.kernel.org>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Hariprasad S [Wed, 13 Jan 2016 04:33:14 +0000 (10:03 +0530)]
iw_cxgb4: Take clip reference before starting IPv6 listen
The h/w is designed in such a way that, if you do anything IPv6
related, a valid clip entry must be there. So take clip reference
before creating IPv6 listening servers, and then if we fail to
create server, release the clip entry.
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Hariprasad S [Tue, 12 Jan 2016 11:03:22 +0000 (16:33 +0530)]
iw_cxgb4: Fixes GW-Basic labels to meaningful error names
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Hariprasad S [Tue, 12 Jan 2016 11:03:21 +0000 (16:33 +0530)]
iw_cxgb4: Fixes static checker warning in c4iw_rdev_open()
Commit
c5dfb000b904 ("iw_cxgb4: Pass qid range to user space driver")
from Dec 11, 2015, leads to the following static checker warning:
drivers/infiniband/hw/cxgb4/device.c:857 c4iw_rdev_open()
warn: variable dereferenced before check 'rdev->status_page'
Also we weren't deallocating ocqp pool in error path when failed to
allocate status page. Fixing it too.
Fixes:
c5dfb000b904 ("iw_cxgb4: Pass qid range to user space driver")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Dan Carpenter [Tue, 12 Jan 2016 09:29:21 +0000 (12:29 +0300)]
IB/cma: allocating too much memory in make_cma_ports()
The issue here is that there is a cut and paste bug. When we allocate
cma_dev_group->default_ports_group we use "sizeof(*cma_dev_group->ports)"
instead of "sizeof(*cma_dev_group->default_ports_group)".
We're bumping up against the 80 character limit so I introduced a new
local pointer "ports_group" to get around that.
Fixes:
045959db65c6 ('IB/cma: Add configfs for rdma_cm')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Dan Carpenter [Tue, 12 Jan 2016 09:27:43 +0000 (12:27 +0300)]
RDMA/nes: checking for NULL instead of IS_ERR
nes_reg_phys_mr() returns ERR_PTRs on error. It doesn't return NULL.
This bug has been there for a while, but we recently changed from
calling a function pointer to calling nes_reg_phys_mr() directly so now
Smatch is able to detect the bug.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Vinit Agnihotri [Mon, 11 Jan 2016 17:57:25 +0000 (12:57 -0500)]
IB/qib: Support creating qps with GFP_NOIO flag
The current code is problematic when the QP creation and ipoib is used to
support NFS and NFS desires to do IO for paging purposes. In that case, the
GFP_KERNEL allocation in qib_qp.c causes a deadlock in tight memory
situations.
This fix adds support to create queue pair with GFP_NOIO flag for connected
mode only to cleanly fail the create queue pair in those situations.
Cc: <stable@vger.kernel.org> # 3.16+
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Vinit Agnihotri <vinit.abhay.agnihotri@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Ira Weiny [Mon, 4 Jan 2016 03:44:25 +0000 (22:44 -0500)]
IB/sysfs: Fix sparse warning on attr_id
Attributed ID was declared as an int while the value should really be big
endian 16.
Fixes:
35c4cbb17811 ("IB/core: Create get_perf_mad function in sysfs.c")
Reported-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Christoph Lameter <cl@linux.com>
Reviewed-by: Hal Rosenstock <hal@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Devesh Sharma [Thu, 24 Dec 2015 18:14:08 +0000 (13:14 -0500)]
RDMA/be2net: Remove open and close entry points
Recently Dough Ledford reported a deadlock happening
between ocrdma-load sequence and NetworkManager service
issueing "open" on be2net interface.
The deadlock happens when any be2net hook (e.g. open/close) is called
in parallel to insmod ocrdma.ko.
A. be2net is sending administrative open/close event to ocrdma holding
device_list_mutex. It does this from ndo_open/ndo_stop hooks of be2net.
So sequence of locks is rtnl_lock---> device_list lock
B. When new ocrdma roce device gets registered, infiniband stack now
takes rtnl_lock in ib_register_device() in GID initialization routines.
So sequence of locks in this path is device_list lock ---> rtnl_lock.
This improper locking sequence causes deadlock.
In order to resolve the above deadlock condition, ocrdma intorduced a
patch to stop listening to administrative open/close events generated from
be2net driver. It now depends on link-state-change async-event generated from
CNA. This change leaves behind dead code which used to generate administrative
open/close events. This patch cleans-up all that dead code from be2net.
Reported-by: Doug Ledford <dledford@redhat.com>
CC: Sathya Perla <sathya.perla@avagotech.com>
Signed-off-by: Padmanabh Ratnakar <padmanabh.ratnakar@avagotech.com>
Signed-off-by: Selvin Xavier <selvin.xavier@avagotech.com>
Signed-off-by: Devesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Devesh Sharma [Thu, 24 Dec 2015 18:14:07 +0000 (13:14 -0500)]
RDMA/ocrdma: Depend on async link events from CNA
Recently Dough Ledford reported a deadlock happening
between ocrdma-load sequence and NetworkManager service
issuing "open" on be2net interface.
The deadlock happens when any be2net hook (e.g. open/close) is called
in parallel to insmod ocrdma.ko.
A. be2net is sending administrative open/close event to ocrdma holding
device_list_mutex. It does this from ndo_open/ndo_stop hooks of be2net.
So sequence of locks is rtnl_lock---> device_list lock
B. When new ocrdma roce device gets registered, infiniband stack now
takes rtnl_lock in ib_register_device() in GID initialization routines.
So sequence of locks in this path is device_list lock ---> rtnl_lock.
This improper locking sequence causes deadlock.
With this patch we stop using administrative open and close events
injected by be2net driver. These events were used to dispatch PORT_ACTIVE
and PORT_ERROR events to the IB-stack. This patch implements a logic
to receive async-link-events generated from CNA whenever link-state-change
is detected. Now on, these async-events will be used to dispatch
PORT_ACTIVE and PORT_ERROR events to IB-stack.
Depending on async-events from CNA removes the need to hold device-list-mutex
and thus breaks the busy-wait scenario.
Reported-by: Doug Ledford <dledford@redhat.com>
CC: Sathya Perla <sathya.perla@avagotech.com>
Signed-off-by: Padmanabh Ratnakar <padmanabh.ratnakar@avagotech.com>
Signed-off-by: Selvin Xavier <selvin.xavier@avagotech.com>
Signed-off-by: Devesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Devesh Sharma [Thu, 24 Dec 2015 18:14:06 +0000 (13:14 -0500)]
RDMA/ocrdma: Dispatch only port event when port state changes
Dispatch only port event to IB stack when port state changes.
Don't explicitly modify qps to error. Let application listen to
port events on async event queue or let QP fail with retry-exceeded
completion error.
Signed-off-by: Padmanabh Ratnakar <padmanabh.ratnakar@avagotech.com>
Signed-off-by: Devesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Devesh Sharma [Thu, 24 Dec 2015 18:14:05 +0000 (13:14 -0500)]
RDMA/ocrdma: Fix vlan-id assignment in qp parameters
vlan-id is wrongly getting as 0 when PFC is enabled.
Set vlan-id configured by user in QP parameters.
In case vlan interface is not used, flash a warning to
user to configure vlan and assign vlan-id as 0 in qp params.
Fixes:
dbf727de7440 ('IB/core: Use GID table in AH creation and dmac resolution')
Cc: Matan Barak <matanb@mellanox.com>
Signed-off-by: Devesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Matan Barak [Thu, 7 Jan 2016 09:19:29 +0000 (11:19 +0200)]
IB/cma: Fix RDMA port validation for iWarp
cma_validate_port wrongly assumed that Ethernet devices are RoCE
devices and thus their ndev should be matched in the GID table.
This broke the iWarp support. Fixing that matching the ndev only if
we work on a RoCE port.
Cc: <stable@vger.kernel.org> # 4.4.x-
Fixes:
abae1b71dd37 ('IB/cma: cma_validate_port should verify the port
and netdevice')
Reported-by: Hariprasad Shenai <hariprasad@chelsio.com>
Tested-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Mike Marciniszyn [Thu, 7 Jan 2016 21:44:10 +0000 (16:44 -0500)]
IB/qib: fix mcast detach when qp not attached
The code produces the following trace:
[
1750924.419007] general protection fault: 0000 [#3] SMP
[
1750924.420364] Modules linked in: nfnetlink autofs4 rpcsec_gss_krb5 nfsv4
dcdbas rfcomm bnep bluetooth nfsd auth_rpcgss nfs_acl dm_multipath nfs lockd
scsi_dh sunrpc fscache radeon ttm drm_kms_helper drm serio_raw parport_pc
ppdev i2c_algo_bit lpc_ich ipmi_si ib_mthca ib_qib dca lp parport ib_ipoib
mac_hid ib_cm i3000_edac ib_sa ib_uverbs edac_core ib_umad ib_mad ib_core
ib_addr tg3 ptp dm_mirror dm_region_hash dm_log psmouse pps_core
[
1750924.420364] CPU: 1 PID: 8401 Comm: python Tainted: G D
3.13.0-39-generic #66-Ubuntu
[
1750924.420364] Hardware name: Dell Computer Corporation PowerEdge
860/0XM089, BIOS A04 07/24/2007
[
1750924.420364] task:
ffff8800366a9800 ti:
ffff88007af1c000 task.ti:
ffff88007af1c000
[
1750924.420364] RIP: 0010:[<
ffffffffa0131d51>] [<
ffffffffa0131d51>]
qib_mcast_qp_free+0x11/0x50 [ib_qib]
[
1750924.420364] RSP: 0018:
ffff88007af1dd70 EFLAGS:
00010246
[
1750924.420364] RAX:
0000000000000001 RBX:
ffff88007b822688 RCX:
000000000000000f
[
1750924.420364] RDX:
ffff88007b822688 RSI:
ffff8800366c15a0 RDI:
6764697200000000
[
1750924.420364] RBP:
ffff88007af1dd78 R08:
0000000000000001 R09:
0000000000000000
[
1750924.420364] R10:
0000000000000011 R11:
0000000000000246 R12:
ffff88007baa1d98
[
1750924.420364] R13:
ffff88003ecab000 R14:
ffff88007b822660 R15:
0000000000000000
[
1750924.420364] FS:
00007ffff7fd8740(0000) GS:
ffff88007fc80000(0000)
knlGS:
0000000000000000
[
1750924.420364] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[
1750924.420364] CR2:
00007ffff597c750 CR3:
000000006860b000 CR4:
00000000000007e0
[
1750924.420364] Stack:
[
1750924.420364]
ffff88007b822688 ffff88007af1ddf0 ffffffffa0132429
000000007af1de20
[
1750924.420364]
ffff88007baa1dc8 ffff88007baa0000 ffff88007af1de70
ffffffffa00cb313
[
1750924.420364]
00007fffffffde88 0000000000000000 0000000000000008
ffff88003ecab000
[
1750924.420364] Call Trace:
[
1750924.420364] [<
ffffffffa0132429>] qib_multicast_detach+0x1e9/0x350
[ib_qib]
[
1750924.568035] [<
ffffffffa00cb313>] ? ib_uverbs_modify_qp+0x323/0x3d0
[ib_uverbs]
[
1750924.568035] [<
ffffffffa0092d61>] ib_detach_mcast+0x31/0x50 [ib_core]
[
1750924.568035] [<
ffffffffa00cc213>] ib_uverbs_detach_mcast+0x93/0x170
[ib_uverbs]
[
1750924.568035] [<
ffffffffa00c61f6>] ib_uverbs_write+0xc6/0x2c0 [ib_uverbs]
[
1750924.568035] [<
ffffffff81312e68>] ? apparmor_file_permission+0x18/0x20
[
1750924.568035] [<
ffffffff812d4cd3>] ? security_file_permission+0x23/0xa0
[
1750924.568035] [<
ffffffff811bd214>] vfs_write+0xb4/0x1f0
[
1750924.568035] [<
ffffffff811bdc49>] SyS_write+0x49/0xa0
[
1750924.568035] [<
ffffffff8172f7ed>] system_call_fastpath+0x1a/0x1f
[
1750924.568035] Code: 66 2e 0f 1f 84 00 00 00 00 00 31 c0 5d c3 66 2e 0f 1f
84 00 00 00 00 00 66 90 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb 48 8b 7f 10
<f0> ff 8f 40 01 00 00 74 0e 48 89 df e8 8e f8 06 e1 5b 5d c3 0f
[
1750924.568035] RIP [<
ffffffffa0131d51>] qib_mcast_qp_free+0x11/0x50
[ib_qib]
[
1750924.568035] RSP <
ffff88007af1dd70>
[
1750924.650439] ---[ end trace
73d5d4b3f8ad4851 ]
The fix is to note the qib_mcast_qp that was found. If none is found, then
return EINVAL indicating the error.
Cc: <stable@vger.kernel.org>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reported-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Erez Shitrit [Thu, 7 Jan 2016 07:28:08 +0000 (09:28 +0200)]
IB/IPoIB: Fix kernel panic on multicast flow
ipoib_mcast_restart_task calls ipoib_mcast_remove_list with the
parameter mcast->dev. That mcast is a temporary (used as an iterator)
variable that may be uninitialized.
There is no need to send the variable dev to the function, as each mcast
has its dev as a member in the mcast struct.
This causes the next panic:
RIP: 0010: ipoib_mcast_leave+0x6d/0xf0 [ib_ipoib]
RSP: 0018: EFLAGS:
00010246
RAX: f0201 RBX: 24e00 RCX: 00000
....
....
Stack:
Call Trace:
ipoib_mcast_remove_list+0x3a/0x70 [ib_ipoib]
ipoib_mcast_restart_task+0x3bb/0x520 [ib_ipoib]
process_one_work+0x164/0x470
worker_thread+0x11d/0x420
...
Fixes:
5a0e81f6f483 ('IB/IPoIB: factor out common multicast list removal code')
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Reported-by: Doron Tsur <doront@mellanox.com>
Reviewed-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Jenny Derzhavetz [Thu, 24 Dec 2015 10:20:48 +0000 (12:20 +0200)]
IB/iser: Support the remote invalidation exception
Declare that we support remote invalidation in case we are:
1. using fastreg method
2. always registering memory
Detect the invalidated rkey from the work completion info so we
won't invalidate it locally. The spec mandates that we must not rely
on the target remote invalidate our rkey so we must check it upon
a receive (scsi response) completion.
Signed-off-by: Jenny Derzhavetz <jennyf@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Sagi Grimberg [Wed, 9 Dec 2015 12:12:07 +0000 (14:12 +0200)]
IB/iser: Change the increment rkey flow logic
When we enable remote invalidate support we won't want to perform
local invalidates at the same time we do today, but we still need
to get new rkeys. So, decouple the rkey update from the local
invalidate and tie it to memory reg instead.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Jenny Derzhavetz <jennyf@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Jenny Derzhavetz [Wed, 9 Dec 2015 12:12:06 +0000 (14:12 +0200)]
IB/isert: Support the remote invalidation exception
We'll use remote invalidate, according to negotiation result
during connection establishment. If the initiator declared that
it supports the remote invalidate exception and the local HCA
supports IB_DEVICE_MEM_MGT_EXTENSIONS then the target will
use IB_WR_SEND_WITH_INV with the correct rkey for the response.
Signed-off-by: Jenny Derzhavetz <jennyf@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Jenny Derzhavetz [Wed, 9 Dec 2015 12:12:05 +0000 (14:12 +0200)]
IB/isert: Declare correct flags when accepting a connection
iser target does not support zero based virtual addresses and
send with invalidate, so it should declare that it doesn't.
Signed-off-by: Jenny Derzhavetz <jennyf@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Sagi Grimberg [Wed, 9 Dec 2015 12:12:04 +0000 (14:12 +0200)]
IB/isert: Remove unused file iser_proto.h
We don't need iser_proto.h anymore, remove it and
move (non-protocol) declarations to ib_isert.h
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Jenny Derzhavetz <jennyf@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Sagi Grimberg [Wed, 9 Dec 2015 12:12:03 +0000 (14:12 +0200)]
IB/iser,isert: Create and use new shared header
The iser RDMA_CM negotiation protocol is shared by
the initiator and the target, so have a shared header
for the defines and structure. Move relevant items from
the initiator and target headers.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Jenny Derzhavetz <jennyf@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Jenny Derzhavetz [Wed, 9 Dec 2015 12:12:02 +0000 (14:12 +0200)]
IB/iser: set intuitive values for mr_valid
This parameter is described as "is mr valid indicator".
In other words, it indicates whether memory registration
is valid or not. So intuitive values would be:
mr_valid=True, when memory registration is valid and
mr_valid=False otherwise.
Signed-off-by: Jenny Derzhavetz <jennyf@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Jenny Derzhavetz [Wed, 9 Dec 2015 12:12:01 +0000 (14:12 +0200)]
IB/iser: Don't register memory for all immediate data writes
When all the task data is sent as immediate data, we are
allowed to use the local_dma_lkey as it is not sent to
the wire.
Signed-off-by: Jenny Derzhavetz <jennyf@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Sagi Grimberg [Wed, 9 Dec 2015 12:12:00 +0000 (14:12 +0200)]
IB/iser: Reuse ib_sg_to_pages
We have in iser iser_sg_to_page_vec which has exactly
the same role as ib_sg_to_pages. Customize the page_vec
to hold a fake MR so we can reuse ib_sg_to_pages.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Roi Dayan [Wed, 9 Dec 2015 12:11:59 +0000 (14:11 +0200)]
IB/iser: Fix module init not cleaning up on error flow
Destroy workqueue on transport register error, also
release kmem cache on workqueue allocation error.
Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Julia Lawall [Sun, 29 Nov 2015 22:02:51 +0000 (23:02 +0100)]
IB/core: constify mmu_notifier_ops structures
This mmu_notifier_ops structure is never modified, so declare it as
const, like the other mmu_notifier_ops structures.
Done with the help of Coccinelle.
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Reviewed-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Julia Lawall [Sat, 28 Nov 2015 15:52:04 +0000 (16:52 +0100)]
IB/iser: constify iser_reg_ops structure
The iser_reg_ops structures are never modified, so declare them as const.
Done with the help of Coccinelle.
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Acked-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Julia Lawall [Sat, 28 Nov 2015 14:00:37 +0000 (15:00 +0100)]
RDMA/nes: constify nes_cm_ops structure
The nes_cm_ops structure is never modified, so declare it as const.
Done with the help of Coccinelle.
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Bodong Wang [Fri, 18 Dec 2015 11:53:20 +0000 (13:53 +0200)]
IB/mlx5: report tx/rx checksum cap in query results
This patch will report the tx/rx checksum cap for raw qp via the
query device results.
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Leon Romanovsky [Thu, 17 Dec 2015 07:31:53 +0000 (09:31 +0200)]
IB/mlx4: Convert kmalloc to kmalloc_array for checkpatch
Convert kmalloc to be kmalloc_array to fix warnings below:
WARNING: Prefer kmalloc_array over kmalloc with multiply
+ qp->sq.wrid = kmalloc(qp->sq.wqe_cnt * sizeof(u64),
WARNING: Prefer kmalloc_array over kmalloc with multiply
+ qp->rq.wrid = kmalloc(qp->rq.wqe_cnt * sizeof(u64),
WARNING: Prefer kmalloc_array over kmalloc with multiply
+ srq->wrid = kmalloc(srq->msrq.max * sizeof(u64),
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Leon Romanovsky [Thu, 17 Dec 2015 07:31:52 +0000 (09:31 +0200)]
IB/mlx4: Suppress non-fatal memory allocations
Failure in kmalloc memory allocations will throw a warning about it.
Such warnings are not needed anymore, since in commit
0ef2f05c7e02
("IB/mlx4: Use vmalloc for WR buffers when needed"), fallback mechanism
from kmalloc() to __vmalloc() was added.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Eran Ben Elisha [Mon, 14 Dec 2015 14:34:10 +0000 (16:34 +0200)]
IB/mlx5: Advertise atomic capabilities in query device
In order to ensure IB spec atomic correctness in atomic operations, if
HW is configured to host endianness, advertise IB_ATOMIC_HCA. if not,
advertise IB_ATOMIC_NONE.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Eran Ben Elisha [Mon, 14 Dec 2015 14:34:09 +0000 (16:34 +0200)]
net/mlx5_core: Add setting ATOMIC endian mode
HW is capable of 2 requestor endianness modes for standard 8 Bytes
atomic: BE (0x0) and host endianness (0x1). Read the supported modes
from hca atomic capabilities and configure HW to host endianness mode if
supported.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Hariprasad S [Fri, 11 Dec 2015 08:29:17 +0000 (13:59 +0530)]
iw_cxgb3: Fix incorrectly returning error on success
The cxgb3_*_send() functions return NET_XMIT_ values, which are
positive integers values. So don't treat positive return values
as an error.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Hariprasad S [Fri, 11 Dec 2015 07:32:01 +0000 (13:02 +0530)]
iw_cxgb4: Pass qid range to user space driver
Enhances the t4_dev_status_page to pass the qid start and size
attributes from iw_cxgb4 to libcxgb4.
Bump the ABI Version to 3 -> To allow libcxgb4 to detect old drivers and
revert to the old way of computing the qid ranges.
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Dean Luick [Thu, 10 Dec 2015 21:52:30 +0000 (16:52 -0500)]
IB/mad: Ensure fairness in ib_mad_completion_handler
It was found that when a process was rapidly sending MADs other processes could
be hung in their unregister calls.
This would happen when process A was injecting packets fast enough that the
single threaded workqueue was never exiting ib_mad_completion_handler.
Therefore when process B called flush_workqueue via the unregister call it
would hang until process A stopped sending MADs.
The fix is to periodically reschedule ib_mad_completion_handler after
processing a large number of completions. The number of completions chosen was
decided based on the defaults for the recv queue size. However, it was kept
fixed such that increasing those queue sizes would not adversely affect
fairness in the future.
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Leon Romanovsky [Sun, 20 Dec 2015 10:16:11 +0000 (12:16 +0200)]
IB/mlx5: Add driver cross-channel support
Add support of cross-channel functionality to mlx5
driver. This includes ability to ignore overrun for CQ
which intended for cross-channel, export device capability and
configure the QP to be sync master/slave queues.
The cross-channel enabled QP supports combination of
three possible properties:
* WQE processing on the receive queue of this QP
* WQE processing on the send queue of this QP
* WQE are supported on the send queue
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Leon Romanovsky [Sun, 20 Dec 2015 10:16:10 +0000 (12:16 +0200)]
IB/core: Add cross-channel support
The cross-channel feature allows to execute WQEs that involve
synchronization of I/O operations’ on different QPs.
This capability enables to program complex flows with a single
function call, hereby significantly reducing overhead associated
with I/O processing.
Cross-channel operations support is indicated by HCA capability
information.
The queue pairs can be configured to work as a “sync master queue”
or “sync slave queues”.
The added flags are:
1. Device capability flag IB_DEVICE_CROSS_CHANNEL for the
devices that can perform cross-channel operations.
2. CQ property flag IB_CQ_FLAGS_IGNORE_OVERRUN to disable CQ overrun
check. This check is useless in cross-channel scenario.
3. QP property flags to indicate if queues are slave or master:
* IB_QP_CREATE_MANAGED_SEND indicates that posted send work requests
will not be executed immediately and requires enabling.
* IB_QP_CREATE_MANAGED_RECV indicates that posted receive work
requests will not be executed immediately and requires enabling.
* IB_QP_CREATE_CROSS_CHANNEL declares the QP to work in cross-channel
mode. If IB_QP_CREATE_MANAGED_SEND and IB_QP_CREATE_MANAGED_RECV are
not provided, this QP will be sync master queue, else it will be sync
slave.
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Leon Romanovsky [Sun, 20 Dec 2015 10:16:09 +0000 (12:16 +0200)]
IB/core: Align coding style of ib_device_cap_flags structure
Modify enum ib_device_cap_flags such that other patches which add new
enum values pass strict checkpatch.pl checks.
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Matan Barak [Tue, 15 Dec 2015 18:30:13 +0000 (20:30 +0200)]
IB/mlx5: Mmap the HCA's core clock register to user-space
In order to read the HCA's current cycles register, we need
to map it to user-space. Add support to map this register
via mmap command.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Moshe Lazer <moshel@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Matan Barak [Tue, 15 Dec 2015 18:30:12 +0000 (20:30 +0200)]
IB/mlx5: Add hca_core_clock_offset to udata in init_ucontext
Pass hca_core_clock_offset to user-space is mandatory in order to
let the user-space read the free-running clock register from the
right offset in the memory mapped page.
Passing this value is done by changing the vendor's command
and response of init_ucontext to be in extensible form.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Moshe Lazer <moshel@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Matan Barak [Tue, 15 Dec 2015 18:30:11 +0000 (20:30 +0200)]
IB/mlx5: Add support for hca_core_clock and timestamp_mask
Reporting the hca_core_clock (in kHZ) and the timestamp_mask in
query_device extended verb. timestamp_mask is used by users in order
to know what is the valid range of the raw timestamps, while
hca_core_clock reports the clock frequency that is used for
timestamps.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Moshe Lazer <moshel@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Matan Barak [Tue, 15 Dec 2015 18:30:10 +0000 (20:30 +0200)]
IB/core: Add ib_is_udata_cleared
Extending core and vendor verb commands require us to check that the
unknown part of the user's given command is all zeros.
Adding ib_is_udata_cleared in order to do so.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Matan Barak [Tue, 15 Dec 2015 18:30:09 +0000 (20:30 +0200)]
IB/mlx5: Add create_cq extended command
In order to create a CQ that supports timestamp, mlx5 needs to
support the extended create CQ command with the timestamp flag.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Christoph Lameter [Mon, 21 Dec 2015 14:20:29 +0000 (08:20 -0600)]
IB/core: Display extended counter set if available
Check if the extended counters are available and if so
create the proper extended and additional counters.
Signed-off-by: Christoph Lameter <cl@linux.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Hal Rosenstock <hal@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Christoph Lameter [Mon, 21 Dec 2015 14:20:28 +0000 (08:20 -0600)]
IB/core: Specify attribute_id in port_table_attribute
Add the attr_id on port_table_attribute since we will have to add
a different port_table_attribute for the extended attribute soon.
Reviewed-by: Hal Rosenstock <hal@mellanox.com>
Signed-off-by: Christoph Lameter <cl@linux.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Christoph Lameter [Mon, 21 Dec 2015 14:20:27 +0000 (08:20 -0600)]
IB/core: Create get_perf_mad function in sysfs.c
Create a new function to retrieve performance management
data from the existing code in get_pma_counter().
Reviewed-by: Hal Rosenstock <hal@mellanox.com>
Signed-off-by: Christoph Lameter <cl@linux.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Or Gerlitz [Wed, 23 Dec 2015 16:30:58 +0000 (18:30 +0200)]
MAINTAINERS: Assign maintainer to Mellanox mlx4 core and IB drivers
The driver was written originally by Roland Dreier, currently there's
no official maintainer. Yishai steps in as maintainer.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Cc: Roland Dreier <roland@kernel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Or Gerlitz [Wed, 23 Dec 2015 16:30:57 +0000 (18:30 +0200)]
MAINTAINERS: Assign new maintainers to Mellanox mlx5 core and IB drivers
Matan and Leon step in as co-maintainers to replace Eli Cohen
who wrote and maintained the core and IB drivers.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Christoph Hellwig [Wed, 23 Dec 2015 18:12:54 +0000 (19:12 +0100)]
IB: remove the write-only usecnt field from struct ib_mr
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@sandisk.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Christoph Hellwig [Wed, 23 Dec 2015 18:12:53 +0000 (19:12 +0100)]
IB: remove the struct ib_phys_buf definition
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> [core]
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Christoph Hellwig [Wed, 23 Dec 2015 18:12:52 +0000 (19:12 +0100)]
ehca: stop using struct ib_phys_buf
And simplify the calling convention for full-memory registrations.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> [core]
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Christoph Hellwig [Wed, 23 Dec 2015 18:12:51 +0000 (19:12 +0100)]
amso1100: fold c2_reg_phys_mr into c2_get_dma_mr
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> [core]
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Christoph Hellwig [Wed, 23 Dec 2015 18:12:50 +0000 (19:12 +0100)]
nes: simplify nes_reg_phys_mr calling conventions
Just pass and address/size pair instead of an ib_phys_buf array.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> [core]
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Christoph Hellwig [Wed, 23 Dec 2015 18:12:49 +0000 (19:12 +0100)]
cxgb3: simplify iwch_get_dma_wr
Fold simplified versions of build_phys_page_list and
iwch_register_phys_mem into iwch_get_dma_wr now that no other callers
are left.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> [core]
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Christoph Hellwig [Wed, 23 Dec 2015 18:12:48 +0000 (19:12 +0100)]
IB: remove in-kernel support for memory windows
Remove the unused ib_allow_mw and ib_bind_mw functions, remove the
unused IB_WR_BIND_MW and IB_WC_BIND_MW opcodes and move ib_dealloc_mw
into the uverbs module.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> [core]
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Christoph Hellwig [Wed, 23 Dec 2015 18:12:47 +0000 (19:12 +0100)]
IB: remove support for phys MRs
We have stopped using phys MRs in the kernel a while ago, so let's
remove all the cruft used to implement them.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> [core]
Reviewed-By: Devesh Sharma<devesh.sharma@avagotech.com> [ocrdma]
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Christoph Hellwig [Wed, 23 Dec 2015 18:12:46 +0000 (19:12 +0100)]
IB: remove ib_query_mr
This functionality has no users and was only supported by the staged out
EHCA driver.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> [core]
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Christoph Hellwig [Wed, 23 Dec 2015 18:12:45 +0000 (19:12 +0100)]
IB: start documenting device capabilities
Just IB_DEVICE_LOCAL_DMA_LKEY and IB_DEVICE_MEM_MGT_EXTENSIONS for now
as I'm most familar with those.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-By: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Christoph Lameter [Mon, 21 Dec 2015 14:42:54 +0000 (08:42 -0600)]
IB/IPoIB: Move multicast specific code out of ipoib_main.c
Code cleanup to move multicast specific code that checks for
a sendonly join to ipoib_multicast.c. This allows the removal
of the export of __ipoib_mcast_find().
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Christoph Lameter [Mon, 21 Dec 2015 14:42:53 +0000 (08:42 -0600)]
IB/IPoIB: factor out common multicast list removal code
Code cleanup to remove multicast specific code from ipoib_main.c
The removal of a list of multicast groups occurs in three places.
Create a new function ipoib_mcast_remove_list(). Use this new
function in ipoib_main.c too.
That in turn allows the dropping of two functions that were
exported from ipoib_multicast.c for expiration of mc groups.
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Achiad Shochat [Wed, 23 Dec 2015 16:47:25 +0000 (18:47 +0200)]
IB/mlx5: Support RoCE
Advertise RoCE support for IB/core layer and set the hardware to
work in RoCE mode.
Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Achiad Shochat [Wed, 23 Dec 2015 16:47:24 +0000 (18:47 +0200)]
IB/mlx5: Add RoCE fields to Address Vector
Set the address handle and QP address path fields according to the
link layer type (IB/Eth).
Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Achiad Shochat [Wed, 23 Dec 2015 16:47:23 +0000 (18:47 +0200)]
IB/mlx5: Support IB device's callbacks for adding/deleting GIDs
These callbacks write into the mlx5 RoCE address table.
Upon del_gid we write a zero'd GID.
Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Achiad Shochat [Wed, 23 Dec 2015 16:47:22 +0000 (18:47 +0200)]
IB/mlx5: Set network_hdr_type upon RoCE responder completion
When handling a responder completion, if the link layer is Ethernet,
set the work completion network_hdr_type field according to CQE's
info and the IB_WC_WITH_NETWORK_HDR_TYPE flag.
Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Achiad Shochat [Wed, 23 Dec 2015 16:47:21 +0000 (18:47 +0200)]
IB/mlx5: Extend query_device/port to support RoCE
Using the vport access functions to retrieve the Ethernet
specific information and return this information in
ib_query_device and ib_query_port.
Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Achiad Shochat [Wed, 23 Dec 2015 16:47:20 +0000 (18:47 +0200)]
net/mlx5_core: Introduce access functions to query vport RoCE fields
Introduce access functions to query NIC vport system_image_guid,
node_guid and qkey_viol_cntr.
Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Achiad Shochat [Wed, 23 Dec 2015 16:47:19 +0000 (18:47 +0200)]
net/mlx5_core: Introduce access functions to enable/disable RoCE
A mlx5 Ethernet port must be explicitly enabled for RoCE.
When RoCE is not enabled on the port, the NIC will refuse to create
QPs attached to it and incoming RoCE packets will be considered by the
NIC as plain Ethernet packets.
Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Achiad Shochat [Wed, 23 Dec 2015 16:47:18 +0000 (18:47 +0200)]
net/mlx5_core: Break down the vport mac address query function
Introduce a new function called mlx5_query_nic_vport_context().
This function gets all the NIC vport attributes from the device.
The MAC address is just one of the NIC vport attributes, so
mlx5_query_nic_vport_mac_address() is now just a wrapper function
above mlx5_query_nic_vport_context().
More NIC vport attributes will be used in following commits.
Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Achiad Shochat [Wed, 23 Dec 2015 16:47:17 +0000 (18:47 +0200)]
IB/mlx5: Support IB device's callback for getting its netdev
For Eth ports only:
Maintain a net device pointer in mlx5_ib_device and update it
upon NETDEV_REGISTER and NETDEV_UNREGISTER events if the
net-device and IB device have the same PCI parent device.
Implement the get_netdev callback to return this net device.
Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Achiad Shochat [Wed, 23 Dec 2015 16:47:16 +0000 (18:47 +0200)]
IB/mlx5: Support IB device's callback for getting the link layer
Make the existing mlx5_ib_port_link_layer() signature match
the ib device callback signature (add port_num parameter).
Refactor it to use a sub function so that the link layer could
be queried also before the ibdev is created.
Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Julia Lawall [Sat, 19 Dec 2015 20:48:59 +0000 (21:48 +0100)]
IB/usnic: delete unneeded IS_ERR test
kzalloc doesn't return ERR_PTR, so there is no need to test for it.
The semantic match that finds this problem is as follows:
(http://coccinelle.lip6.fr/)
// <smpl>
@@
expression x,e;
@@
* x = kzalloc(...)
... when != x = e
* IS_ERR_OR_NULL(x)
// </smpl>
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Reviewed-by: Dave Goodell <dgoodell@cisco.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Nelson Escobar [Wed, 9 Dec 2015 18:42:19 +0000 (10:42 -0800)]
IB/usnic: Handle 0 counts in resource allocation
Signed-off-by: Dave Goodell <dgoodell@cisco.com>
Reviewed-by: Reese Faucette <rfaucett@cisco.com>
Reviewed-by: Xuyang Wang <xuywang@cisco.com>
Signed-off-by: Nelson Escobar <neescoba@cisco.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Nelson Escobar [Wed, 9 Dec 2015 18:42:18 +0000 (10:42 -0800)]
IB/usnic: Fix resource leak in error case
Signed-off-by: Dave Goodell <dgoodell@cisco.com>
Reviewed-by: Reese Faucette <rfaucett@cisco.com>
Reviewed-by: Xuyang Wang <xuywang@cisco.com>
Signed-off-by: Nelson Escobar <neescoba@cisco.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Nelson Escobar [Wed, 9 Dec 2015 18:42:17 +0000 (10:42 -0800)]
IB/usnic: Support more QP state transitions
They were already implemented at a lower layer, but the upper level
routine placed arbitrary restrictions on which transitions were
permitted. Simplify the state machine logic to live wholly in
usnic_ib_qp_grp_modify.
Signed-off-by: Dave Goodell <dgoodell@cisco.com>
Reviewed-by: Reese Faucette <rfaucett@cisco.com>
Reviewed-by: Xuyang Wang <xuywang@cisco.com>
Signed-off-by: Nelson Escobar <neescoba@cisco.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Nelson Escobar [Wed, 9 Dec 2015 18:42:16 +0000 (10:42 -0800)]
IB/usnic: Fix message typo
Signed-off-by: Dave Goodell <dgoodell@cisco.com>
Reviewed-by: Reese Faucette <rfaucett@cisco.com>
Reviewed-by: Xuyang Wang <xuywang@cisco.com>
Signed-off-by: Nelson Escobar <neescoba@cisco.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Nelson Escobar [Wed, 9 Dec 2015 18:42:15 +0000 (10:42 -0800)]
IB/usnic: Fix incorrect cast in usnic_ib_fw_string_to_u64
Signed-off-by: Christian Benvenuti <benve@cisco.com>
Signed-off-by: Nelson Escobar <neescoba@cisco.com>
Reviewed-by: Dave Goodell <dgoodell@cisco.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Nelson Escobar [Wed, 9 Dec 2015 18:42:14 +0000 (10:42 -0800)]
IB/usnic: Improve a failure message
Signed-off-by: Nelson Escobar <neescoba@cisco.com>
Reviewed-by: Dave Goodell <dgoodell@cisco.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Nelson Escobar [Wed, 9 Dec 2015 18:39:49 +0000 (10:39 -0800)]
IB/usnic: Remove unused prototype
query_protocol() was added in commit
6b90a6d66b17 ("IB/Verbs:
Implement new callback query_protocol()") and then removed in
commit
f9b22e355d38 ("IB/core: Convert core to use bitfield
for caps").
This left behind an unused prototype.
Signed-off-by: Nelson Escobar <neescoba@cisco.com>
Reviewed-by: Dave Goodell <dgoodell@cisco.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>