Haggai Abramovsky [Thu, 14 Jan 2016 17:12:57 +0000 (19:12 +0200)]
IB/mlx5: Add CQE version 1 support to user QPs and SRQs
Enforce working with CQE version 1 when the user supports CQE
version 1 and asked to work this way.
If the user still works with CQE version 0, then use the default
CQE version to tell the Firmware that the user still works in the
older mode.
After this patch, the kernel still reports CQE version 0.
Signed-off-by: Haggai Abramovsky <hagaya@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Haggai Abramovsky [Thu, 14 Jan 2016 17:12:56 +0000 (19:12 +0200)]
IB/mlx5: Fix data validation in mlx5_ib_alloc_ucontext
The wrong buffer size was passed to ib_is_udata_cleared.
Signed-off-by: Haggai Abramovsky <hagaya@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Kaike Wan [Thu, 21 Jan 2016 13:41:31 +0000 (08:41 -0500)]
IB/sa: Fix netlink local service GFP crash
The rdma netlink local service registers a handler to handle RESOLVE
response and another handler to handle SET_TIMEOUT request. The first
thing these handlers do is to call netlink_capable() to check the
access right of the received skb to make sure that the sender has root
access. Under normal conditions, such responses and requests will be
directly forwarded to the handlers without going through the netlink_dump
pathway (see ibnl_rcv_msg() in drivers/infiniband/core/netlink.c).
However, a user application could send a RESOLVE request (not response)
to the local service, which will fall into the netlink_dump pathway,
where a new skb will be created without initializing the control block.
This new skb will be eventually forwarded to the local service RESOLVE
response handler. Unfortunately, netlink_capable() will cause general
protection fault if the skb's control block is not initialized. This
patch will address the problem by checking the skb first.
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Sagi Grimberg [Tue, 12 Jan 2016 11:19:50 +0000 (13:19 +0200)]
IB/srpt: Remove redundant wc array
No usage after the conversion to the new CQ API.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Mike Marciniszyn [Thu, 24 Dec 2015 16:19:23 +0000 (11:19 -0500)]
IB/qib: Improve ipoib UD performance
Based on profiling, UD performance drops in case of processes
in a single client due to excess context switches when
the progress workqueue is scheduled.
This is solved by modifying the heuristic to select the
direct progress instead of the scheduling progress via
the workqueue when UD-like situations are detected in
the heuristic.
Reviewed-by: Vinit Agnihotri <vinit.abhay.agnihotri@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Matan Barak [Thu, 14 Jan 2016 15:50:43 +0000 (17:50 +0200)]
IB/mlx4: Advertise RoCE v2 support
Advertise RoCE v2 support in port_immutable attributes according to
the hardware's capabilities. This enables the verbs stack to use
RoCE v2 mode.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Moni Shoua [Thu, 14 Jan 2016 15:50:42 +0000 (17:50 +0200)]
IB/mlx4: Create and use another QP1 for RoCEv2
The mlx4 driver uses a special QP to implement the GSI QP. This kind
of QP allows to build the InfiniBand headers in software.
When mlx4 hardware builds the packet, it calculates the ICRC and puts
it at the end of the payload. However, this ICRC calculation depends
on the QP configuration, which is determined when the QP is modified
(roce_mode during INIT->RTR).
When receiving a packet, the ICRC verification doesn't depend on this
configuration.
Therefore, using two GSI QPs for send (one for each RoCE version) and
one GSI QP for receive are required.
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Moni Shoua [Thu, 14 Jan 2016 15:50:41 +0000 (17:50 +0200)]
IB/mlx4: Enable send of RoCE QP1 packets with IP/UDP headers
RoCEv2 packets are sent over IP/UDP protocols.
The mlx4 driver uses a type of RAW QP to send packets for QP1 and
therefore needs to build the network headers below BTH in software.
This patch adds option to build QP1 packets with IP and UDP headers if
RoCEv2 is requested.
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Moni Shoua [Thu, 14 Jan 2016 15:50:40 +0000 (17:50 +0200)]
IB/mlx4: Enable RoCE v2 when the IB device is added
If the hardware supports RoCE v2, we configure the hardware UDP
port according to the RoCE v2 Annex when mlx4_ib device is added.
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Moni Shoua [Thu, 14 Jan 2016 15:50:39 +0000 (17:50 +0200)]
IB/mlx4: Support modify_qp for RoCE v2
In order to support modify_qp for RoCE v2, we need to set
the gid_type in the QP context.
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Moni Shoua [Thu, 14 Jan 2016 15:50:38 +0000 (17:50 +0200)]
IB/core: Add definition for the standard RoCE V2 UDP port
This will be used in hardware device driver when building QP or AH
contexts.
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Moni Shoua [Thu, 14 Jan 2016 15:50:37 +0000 (17:50 +0200)]
net/mlx4_core: Add support for RoCE v2 entropy
In RoCE v2 we need to choose a source UDP port, we do so by using
entropy over the source and dest QPNs.
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Moni Shoua [Thu, 14 Jan 2016 15:50:36 +0000 (17:50 +0200)]
net/mlx4_core: Add support for configuring RoCE v2 UDP port
In order to support RoCE v2, the hardware needs to be configured
to classify certain UDP packets as RoCE v2 packets and pass it
through its RoCE pipeline. This patch enables configuring this
UDP port.
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Moni Shoua [Thu, 14 Jan 2016 15:50:35 +0000 (17:50 +0200)]
IB/mlx4: Add support for setting RoCEv2 gids in hardware
To tell hardware about a gid with type RoCEv2, software needs a new
modifier to the SET_PORT command: MLX4_SET_PORT_ROCE_ADDR. This can
replace the old method, MLX4_SET_PORT_GID_TABLE, for RoCEv1 gids.
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Moni Shoua [Thu, 14 Jan 2016 15:50:34 +0000 (17:50 +0200)]
net/mlx4_core: Configure mlx4 hardware for mixed RoCE v1/v2 modes
If the hardware supports RoCE v2 (mixed with RoCE v1) mode, we enable
it. This is necessary in order to support RoCE v2.
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Moni Shoua [Thu, 14 Jan 2016 15:50:33 +0000 (17:50 +0200)]
IB/mlx4: Add gid_type to GID properties
IB core driver adds a property of type to struct ib_gid_attr.
The mlx4 driver should take that in consideration when modifying or
querying the hardware gid table.
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Moni Shoua [Thu, 14 Jan 2016 15:50:32 +0000 (17:50 +0200)]
net/mlx4: Query RoCE support
Query the RoCE support from firmware using the appropriate firmware
commands. Downstream patches will read these capabilities and act
accordingly.
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Christoph Hellwig [Fri, 8 Jan 2016 07:53:41 +0000 (23:53 -0800)]
svc_rdma: use local_dma_lkey
We now alwasy have a per-PD local_dma_lkey available. Make use of that
fact in svc_rdma and stop registering our own MR.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Acked-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Chuck Lever [Thu, 7 Jan 2016 19:50:10 +0000 (14:50 -0500)]
svcrdma: Add class for RDMA backwards direction transport
To support the server-side of an NFSv4.1 backchannel on RDMA
connections, add a transport class that enables backward
direction messages on an existing forward channel connection.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Acked-by: Bruce Fields <bfields@fieldses.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Chuck Lever [Thu, 7 Jan 2016 19:50:02 +0000 (14:50 -0500)]
svcrdma: Define maximum number of backchannel requests
Extra resources for handling backchannel requests have to be
pre-allocated when a transport instance is created. Set up
additional fields in svcxprt_rdma to track these resources.
The max_requests fields are elements of the RPC-over-RDMA
protocol, so they should be u32. To ensure that unsigned
arithmetic is used everywhere, some other fields in the
svcxprt_rdma struct are updated.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Acked-by: Bruce Fields <bfields@fieldses.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Chuck Lever [Thu, 7 Jan 2016 19:49:53 +0000 (14:49 -0500)]
svcrdma: Make map_xdr non-static
Pre-requisite to use map_xdr in the backchannel code.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Acked-by: Bruce Fields <bfields@fieldses.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Chuck Lever [Thu, 7 Jan 2016 19:49:45 +0000 (14:49 -0500)]
svcrdma: Remove last two __GFP_NOFAIL call sites
Clean up.
These functions can otherwise fail, so check for page allocation
failures too.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Acked-by: Bruce Fields <bfields@fieldses.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Chuck Lever [Thu, 7 Jan 2016 19:49:37 +0000 (14:49 -0500)]
svcrdma: Add gfp flags to svc_rdma_post_recv()
svc_rdma_post_recv() allocates pages for receive buffers on-demand.
It uses GFP_KERNEL so the allocator tries hard, and may sleep. But
I'm about to add a call to svc_rdma_post_recv() from a function
that may not sleep.
Since all svc_rdma_post_recv() call sites can tolerate its failure,
allow it to fail if the page allocator returns nothing. Longer term,
receive buffers, being a finite resource per-connection, should be
pre-allocated and re-used.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Acked-by: Bruce Fields <bfields@fieldses.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Chuck Lever [Thu, 7 Jan 2016 19:49:28 +0000 (14:49 -0500)]
svcrdma: Remove unused req_map and ctxt kmem_caches
Clean up.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Acked-by: Bruce Fields <bfields@fieldses.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Chuck Lever [Thu, 7 Jan 2016 19:49:20 +0000 (14:49 -0500)]
svcrdma: Improve allocation of struct svc_rdma_req_map
To ensure this allocation cannot fail and will not sleep,
pre-allocate the req_map structures per-connection.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Acked-by: Bruce Fields <bfields@fieldses.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Chuck Lever [Thu, 7 Jan 2016 19:49:12 +0000 (14:49 -0500)]
svcrdma: Improve allocation of struct svc_rdma_op_ctxt
When the maximum payload size of NFS READ and WRITE was increased
by commit
cc9a903d915c ("svcrdma: Change maximum server payload back
to RPCSVC_MAXPAYLOAD"), the size of struct svc_rdma_op_ctxt
increased to over 6KB (on x86_64). That makes allocating one of
these from a kmem_cache more likely to fail in situations when
system memory is exhausted.
Since I'm about to add a caller where this allocation must always
work _and_ it cannot sleep, pre-allocate ctxts for each connection.
Another motivation for this change is that NFSv4.x servers are
required by specification not to drop NFS requests. Pre-allocating
memory resources reduces the likelihood of a drop.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Acked-by: Bruce Fields <bfields@fieldses.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Chuck Lever [Thu, 7 Jan 2016 19:49:03 +0000 (14:49 -0500)]
svcrdma: Clean up process_context()
Be sure the completed ctxt is put in every path.
The xprt enqueue can take a while, so put the completed ctxt back
in circulation _before_ enqueuing the xprt.
Remove/disable debugging.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Acked-by: Bruce Fields <bfields@fieldses.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Chuck Lever [Thu, 7 Jan 2016 19:48:55 +0000 (14:48 -0500)]
svcrdma: Clean up rdma_create_xprt()
kzalloc is used here, so setting the atomic fields to zero is
unnecessary. sc_ord is set again in handle_connect_req. The other
fields are re-initialized in svc_rdma_accept().
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Acked-by: Bruce Fields <bfields@fieldses.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Matan Barak [Mon, 4 Jan 2016 08:49:54 +0000 (10:49 +0200)]
IB/core: Use hop-limit from IP stack for RoCE
Previously, IPV6_DEFAULT_HOPLIMIT was used as the hop limit value for
RoCE. Fixing that by taking ip4_dst_hoplimit and ip6_dst_hoplimit as
hop limit values.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Matan Barak [Mon, 4 Jan 2016 08:49:53 +0000 (10:49 +0200)]
IB/core: Rename rdma_addr_find_dmac_by_grh
rdma_addr_find_dmac_by_grh resolves dmac, vlan_id and if_index and
downsteram patch will also add hop_limit as an output parameter,
thus we rename it to rdma_addr_find_l2_eth_by_grh.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Bart Van Assche [Fri, 1 Jan 2016 12:17:46 +0000 (13:17 +0100)]
IB/cm: Fix a recently introduced deadlock
ib_send_cm_drep() calls cm_enter_timewait() while holding a spinlock
that can be locked from inside an interrupt handler. Hence do not
enable interrupts inside cm_enter_timewait() if called with interrupts
disabled.
This patch fixes e.g. the following deadlock:
Acked-by: Erez Shitrit <erezsh@mellanox.com>
=================================
[ INFO: inconsistent lock state ]
4.4.0-rc7+ #1 Tainted: G E
---------------------------------
inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
swapper/8/0 [HC1[1]:SC0[0]:HE0:SE1] takes:
(&(&cm_id_priv->lock)->rlock){?.+...}, at: [<
ffffffffa036eec4>] cm_establish+0x
74/0x1b0 [ib_cm]
{HARDIRQ-ON-W} state was registered at:
[<
ffffffff810a3c11>] mark_held_locks+0x71/0x90
[<
ffffffff810a3e87>] trace_hardirqs_on_caller+0xa7/0x1c0
[<
ffffffff810a3fad>] trace_hardirqs_on+0xd/0x10
[<
ffffffff8151c40b>] _raw_spin_unlock_irq+0x2b/0x40
[<
ffffffffa036ea8e>] cm_enter_timewait+0xae/0x100 [ib_cm]
[<
ffffffffa036ff76>] ib_send_cm_drep+0xb6/0x190 [ib_cm]
[<
ffffffffa052ed08>] srp_cm_handler+0x128/0x1a0 [ib_srp]
[<
ffffffffa0370340>] cm_process_work+0x20/0xf0 [ib_cm]
[<
ffffffffa0371335>] cm_dreq_handler+0x135/0x2c0 [ib_cm]
[<
ffffffffa03733c5>] cm_work_handler+0x75/0xd0 [ib_cm]
[<
ffffffff8107184d>] process_one_work+0x1bd/0x460
[<
ffffffff81073148>] worker_thread+0x118/0x420
[<
ffffffff81078454>] kthread+0xe4/0x100
[<
ffffffff8151cbbf>] ret_from_fork+0x3f/0x70
irq event stamp:
1672286
hardirqs last enabled at (
1672283): [<
ffffffff81408ec0>] poll_idle+0x10/0x80
hardirqs last disabled at (
1672284): [<
ffffffff8151d304>] common_interrupt+0x84/0x89
softirqs last enabled at (
1672286): [<
ffffffff8105b4dc>] _local_bh_enable+0x1c/0x50
softirqs last disabled at (
1672285): [<
ffffffff8105b697>] irq_enter+0x47/0x70
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&(&cm_id_priv->lock)->rlock);
<Interrupt>
lock(&(&cm_id_priv->lock)->rlock);
*** DEADLOCK ***
no locks held by swapper/8/0.
stack backtrace:
CPU: 8 PID: 0 Comm: swapper/8 Tainted: G E 4.4.0-rc7+ #1
Hardware name: Dell Inc. PowerEdge R430/03XKDV, BIOS 1.0.2 11/17/2014
ffff88045af5e950 ffff88046e503a88 ffffffff81251c1b 0000000000000007
0000000000000006 0000000000000003 ffff88045af5ddc0 ffff88046e503ad8
ffffffff810a32f4 0000000000000000 0000000000000000 0000000000000001
Call Trace:
<IRQ> [<
ffffffff81251c1b>] dump_stack+0x4f/0x74
[<
ffffffff810a32f4>] print_usage_bug+0x184/0x190
[<
ffffffff810a36e2>] mark_lock_irq+0xf2/0x290
[<
ffffffff810a3995>] mark_lock+0x115/0x1b0
[<
ffffffff810a3b8c>] mark_irqflags+0x15c/0x170
[<
ffffffff810a4fef>] __lock_acquire+0x1ef/0x560
[<
ffffffff810a53c2>] lock_acquire+0x62/0x80
[<
ffffffff8151bd33>] _raw_spin_lock_irqsave+0x43/0x60
[<
ffffffffa036eec4>] cm_establish+0x74/0x1b0 [ib_cm]
[<
ffffffffa036f031>] ib_cm_notify+0x31/0x100 [ib_cm]
[<
ffffffffa0637f24>] srpt_qp_event+0x54/0xd0 [ib_srpt]
[<
ffffffffa0196052>] mlx4_ib_qp_event+0x72/0xc0 [mlx4_ib]
[<
ffffffffa00775b9>] mlx4_qp_event+0x69/0xd0 [mlx4_core]
[<
ffffffffa006000e>] mlx4_eq_int+0x51e/0xd50 [mlx4_core]
[<
ffffffffa006084f>] mlx4_msi_x_interrupt+0xf/0x20 [mlx4_core]
[<
ffffffff810b67b0>] handle_irq_event_percpu+0x40/0x110
[<
ffffffff810b68bf>] handle_irq_event+0x3f/0x70
[<
ffffffff810ba7f9>] handle_edge_irq+0x79/0x120
[<
ffffffff81007f3d>] handle_irq+0x5d/0x130
[<
ffffffff810071fd>] do_IRQ+0x6d/0x130
[<
ffffffff8151d309>] common_interrupt+0x89/0x89
<EOI> [<
ffffffff8140895f>] cpuidle_enter_state+0xcf/0x200
[<
ffffffff81408aa2>] cpuidle_enter+0x12/0x20
[<
ffffffff810990d6>] call_cpuidle+0x36/0x60
[<
ffffffff81099163>] cpuidle_idle_call+0x63/0x110
[<
ffffffff8109930a>] cpu_idle_loop+0xfa/0x130
[<
ffffffff8109934e>] cpu_startup_entry+0xe/0x10
[<
ffffffff8103c443>] start_secondary+0x83/0x90
Fixes: commit
be4b499323bf ("IB/cm: Do not queue work to a device that's going away")
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Erez Shitrit <erezsh@mellanox.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Bart Van Assche [Thu, 31 Dec 2015 08:56:43 +0000 (09:56 +0100)]
IB/srpt: Fix the RDMA completion handlers
Avoid that the following kernel crash is triggered when processing
an RDMA completion:
BUG: unable to handle kernel paging request at
0000000100000198
IP: [<
ffffffff810a4ea2>] __lock_acquire+0xa2/0x560
Call Trace:
[<
ffffffff810a53c2>] lock_acquire+0x62/0x80
[<
ffffffff8151bd33>] _raw_spin_lock_irqsave+0x43/0x60
[<
ffffffffa04fd437>] srpt_rdma_read_done+0x57/0x120 [ib_srpt]
[<
ffffffffa0144dd3>] __ib_process_cq+0x43/0xc0 [ib_core]
[<
ffffffffa0145115>] ib_cq_poll_work+0x25/0x70 [ib_core]
[<
ffffffff8107184d>] process_one_work+0x1bd/0x460
[<
ffffffff81073148>] worker_thread+0x118/0x420
[<
ffffffff81078454>] kthread+0xe4/0x100
[<
ffffffff8151cbbf>] ret_from_fork+0x3f/0x70
Fixes: commit
59fae4deaad3 ("IB/srpt: chain RDMA READ/WRITE requests").
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Bart Van Assche [Thu, 31 Dec 2015 08:56:03 +0000 (09:56 +0100)]
irq_poll: Fix irq_poll_sched()
The IRQ_POLL_F_SCHED bit is set as long as polling is ongoing.
This means that irq_poll_sched() must proceed if this bit has
not yet been set.
Fixes: commit
ea51190c0315 ("irq_poll: fold irq_poll_sched_prep into irq_poll_sched").
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Matan Barak [Wed, 30 Dec 2015 14:14:18 +0000 (16:14 +0200)]
IB/core: Fix dereference before check
Sparse complains about dereference before check. Fixing this by
moving the check before the dereference.
Fixes:
200298326b27 ('IB/core: Validate route when we init ah')
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Matan Barak [Wed, 30 Dec 2015 14:14:17 +0000 (16:14 +0200)]
IB/core: Eliminate sparse false context imbalance warning
When write_gid function needs to do a sleep-able operation, it unlocks
table->rwlock and then relocks it. Sparse complains about context
imbalance.
This is safe as write_gid is always called with table->rwlock.
write_gid protects from simultaneous writes to this GID entry
by setting the GID_TABLE_ENTRY_INVALID flag.
Fixes:
9c584f049596 ('IB/core: Change per-entry lock in RoCE GID table to
one lock')
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Hal Rosenstock [Tue, 29 Dec 2015 10:43:43 +0000 (05:43 -0500)]
IB/core: sysfs.c: Fix PerfMgt ClassPortInfo handling
Port number is not part of ClassPortInfo attribute but is
still needed as a parameter when invoking process_mad.
To properly handle this attribute, port_num is added as a
parameter to get_counter_table and get_perf_mad was changed
not to store port_num in the attribute itself when it's
querying the ClassPortInfo attribute.
This handles issue pointed out by Matan Barak <matanb@dev.mellanox.co.il>
Fixes:
145d9c541032 ('IB/core: Display extended counter set if available')
Signed-off-by: Hal Rosenstock <hal@mellanox.com>
Acked-by: Matan Barak <matanb@mellanox.com>
Acked-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Bart Van Assche [Tue, 29 Dec 2015 09:45:03 +0000 (10:45 +0100)]
IB/core: Remove set-but-not-used variable from ib_sg_to_pages()
Detected this by building the IB core with W=1. See also patch
"IB core: Fix ib_sg_to_pages()" (commit
8f5ba10ed40a).
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Sagi Grimberg <sagig@mellanox.com>
Cc: Christoph Hellwig <hch@lst.de>
Reviewed-by: Leon Romanovsky <leon.romanovsky@mellanox.com>
Acked-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Leon Romanovsky [Sat, 9 Jan 2016 11:06:25 +0000 (13:06 +0200)]
IB/mlx5: Fix passing casted pointer in mlx5_query_port_roce
Fix static checker warning:
drivers/infiniband/hw/mlx5/main.c:149 mlx5_query_port_roce()
warn: passing casted pointer '&props->qkey_viol_cntr' to
'mlx5_query_nic_vport_qkey_viol_cntr()' 32 vs 16.
Fixes:
3f89a643eb29 ("IB/mlx5: Extend query_device/port to support RoCE")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Christoph Hellwig [Wed, 6 Jan 2016 06:46:12 +0000 (22:46 -0800)]
IB/mad: use CQ abstraction
Remove the local workqueue to process mad completions and use the CQ API
instead.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hal Rosenstock <hal@mellanox.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Christoph Hellwig [Mon, 4 Jan 2016 13:15:58 +0000 (14:15 +0100)]
IB/mad: pass ib_mad_send_buf explicitly to the recv_handler
Stop abusing wr_id and just pass the parameter explicitly.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hal Rosenstock <hal@mellanox.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Lucas Tanure [Tue, 19 Jan 2016 14:06:30 +0000 (12:06 -0200)]
infiniband: Replace memset with eth_zero_addr
Use eth_zero_addr to assign the zero address to the given address
array instead of memset when second argument is address of zero.
Signed-off-by: Lucas Tanure <tanure@linux.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Leon Romanovsky [Tue, 19 Jan 2016 09:11:24 +0000 (11:11 +0200)]
IB/mlx5: Delete locally redefined variable
Fix the following sparse warning:
drivers/infiniband/hw/mlx5/main.c:1061:29: warning: symbol 'pfn' shadows
an earlier one
drivers/infiniband/hw/mlx5/main.c:1030:21: originally declared here
Fixes:
d69e3bcf7976 ('IB/mlx5: Mmap the HCA's core clock register to user-space')
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Moni Shoua [Thu, 14 Jan 2016 15:48:07 +0000 (17:48 +0200)]
net/mlx4: Remove unused macro
The macro mlx4_foreach_non_ib_transport_port() is not used anywhere. Remove it.
Fixes:
aa9a2d51a3e7 ("mlx4: Activate RoCE/SRIOV")
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Moni Shoua [Thu, 14 Jan 2016 15:47:38 +0000 (17:47 +0200)]
IB/mlx4: Take source mac from AH instead from the port
In commit
dbf727de7440 ("IB/core: Use GID table in AH creation and dmac
resolution") we copy source mac to mlx4_ah from the attributes of
gid at ib_ah_attr.grh.sgid_index. Now we can use it.
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Matan Barak [Thu, 14 Jan 2016 15:47:02 +0000 (17:47 +0200)]
IB/mlx4: Initialize hop_limit when creating address handle
Hop limit value wasn't copied from attributes when ah was created.
This may influence packets for unconnected services to get dropped in
routers when endpoints are not in the same subnet.
Fixes:
fa417f7b520e ("IB/mlx4: Add support for IBoE")
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Leon Romanovsky [Thu, 14 Jan 2016 06:11:40 +0000 (08:11 +0200)]
IB/mlx5: Expose correct maximum number of CQE capacity
Maximum number of EQE capacity per CQ was mistakenly exposed
as CQE. Fix that.
Fixes:
938fe83c8dcb ("net/mlx5_core: New device capabilities handling")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Cc: <stable@vger.kernel.org>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Hariprasad S [Wed, 13 Jan 2016 04:33:14 +0000 (10:03 +0530)]
iw_cxgb4: Take clip reference before starting IPv6 listen
The h/w is designed in such a way that, if you do anything IPv6
related, a valid clip entry must be there. So take clip reference
before creating IPv6 listening servers, and then if we fail to
create server, release the clip entry.
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Hariprasad S [Tue, 12 Jan 2016 11:03:22 +0000 (16:33 +0530)]
iw_cxgb4: Fixes GW-Basic labels to meaningful error names
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Hariprasad S [Tue, 12 Jan 2016 11:03:21 +0000 (16:33 +0530)]
iw_cxgb4: Fixes static checker warning in c4iw_rdev_open()
Commit
c5dfb000b904 ("iw_cxgb4: Pass qid range to user space driver")
from Dec 11, 2015, leads to the following static checker warning:
drivers/infiniband/hw/cxgb4/device.c:857 c4iw_rdev_open()
warn: variable dereferenced before check 'rdev->status_page'
Also we weren't deallocating ocqp pool in error path when failed to
allocate status page. Fixing it too.
Fixes:
c5dfb000b904 ("iw_cxgb4: Pass qid range to user space driver")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Dan Carpenter [Tue, 12 Jan 2016 09:29:21 +0000 (12:29 +0300)]
IB/cma: allocating too much memory in make_cma_ports()
The issue here is that there is a cut and paste bug. When we allocate
cma_dev_group->default_ports_group we use "sizeof(*cma_dev_group->ports)"
instead of "sizeof(*cma_dev_group->default_ports_group)".
We're bumping up against the 80 character limit so I introduced a new
local pointer "ports_group" to get around that.
Fixes:
045959db65c6 ('IB/cma: Add configfs for rdma_cm')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Dan Carpenter [Tue, 12 Jan 2016 09:27:43 +0000 (12:27 +0300)]
RDMA/nes: checking for NULL instead of IS_ERR
nes_reg_phys_mr() returns ERR_PTRs on error. It doesn't return NULL.
This bug has been there for a while, but we recently changed from
calling a function pointer to calling nes_reg_phys_mr() directly so now
Smatch is able to detect the bug.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Vinit Agnihotri [Mon, 11 Jan 2016 17:57:25 +0000 (12:57 -0500)]
IB/qib: Support creating qps with GFP_NOIO flag
The current code is problematic when the QP creation and ipoib is used to
support NFS and NFS desires to do IO for paging purposes. In that case, the
GFP_KERNEL allocation in qib_qp.c causes a deadlock in tight memory
situations.
This fix adds support to create queue pair with GFP_NOIO flag for connected
mode only to cleanly fail the create queue pair in those situations.
Cc: <stable@vger.kernel.org> # 3.16+
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Vinit Agnihotri <vinit.abhay.agnihotri@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Ira Weiny [Mon, 4 Jan 2016 03:44:25 +0000 (22:44 -0500)]
IB/sysfs: Fix sparse warning on attr_id
Attributed ID was declared as an int while the value should really be big
endian 16.
Fixes:
35c4cbb17811 ("IB/core: Create get_perf_mad function in sysfs.c")
Reported-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Christoph Lameter <cl@linux.com>
Reviewed-by: Hal Rosenstock <hal@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Devesh Sharma [Thu, 24 Dec 2015 18:14:08 +0000 (13:14 -0500)]
RDMA/be2net: Remove open and close entry points
Recently Dough Ledford reported a deadlock happening
between ocrdma-load sequence and NetworkManager service
issueing "open" on be2net interface.
The deadlock happens when any be2net hook (e.g. open/close) is called
in parallel to insmod ocrdma.ko.
A. be2net is sending administrative open/close event to ocrdma holding
device_list_mutex. It does this from ndo_open/ndo_stop hooks of be2net.
So sequence of locks is rtnl_lock---> device_list lock
B. When new ocrdma roce device gets registered, infiniband stack now
takes rtnl_lock in ib_register_device() in GID initialization routines.
So sequence of locks in this path is device_list lock ---> rtnl_lock.
This improper locking sequence causes deadlock.
In order to resolve the above deadlock condition, ocrdma intorduced a
patch to stop listening to administrative open/close events generated from
be2net driver. It now depends on link-state-change async-event generated from
CNA. This change leaves behind dead code which used to generate administrative
open/close events. This patch cleans-up all that dead code from be2net.
Reported-by: Doug Ledford <dledford@redhat.com>
CC: Sathya Perla <sathya.perla@avagotech.com>
Signed-off-by: Padmanabh Ratnakar <padmanabh.ratnakar@avagotech.com>
Signed-off-by: Selvin Xavier <selvin.xavier@avagotech.com>
Signed-off-by: Devesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Devesh Sharma [Thu, 24 Dec 2015 18:14:07 +0000 (13:14 -0500)]
RDMA/ocrdma: Depend on async link events from CNA
Recently Dough Ledford reported a deadlock happening
between ocrdma-load sequence and NetworkManager service
issuing "open" on be2net interface.
The deadlock happens when any be2net hook (e.g. open/close) is called
in parallel to insmod ocrdma.ko.
A. be2net is sending administrative open/close event to ocrdma holding
device_list_mutex. It does this from ndo_open/ndo_stop hooks of be2net.
So sequence of locks is rtnl_lock---> device_list lock
B. When new ocrdma roce device gets registered, infiniband stack now
takes rtnl_lock in ib_register_device() in GID initialization routines.
So sequence of locks in this path is device_list lock ---> rtnl_lock.
This improper locking sequence causes deadlock.
With this patch we stop using administrative open and close events
injected by be2net driver. These events were used to dispatch PORT_ACTIVE
and PORT_ERROR events to the IB-stack. This patch implements a logic
to receive async-link-events generated from CNA whenever link-state-change
is detected. Now on, these async-events will be used to dispatch
PORT_ACTIVE and PORT_ERROR events to IB-stack.
Depending on async-events from CNA removes the need to hold device-list-mutex
and thus breaks the busy-wait scenario.
Reported-by: Doug Ledford <dledford@redhat.com>
CC: Sathya Perla <sathya.perla@avagotech.com>
Signed-off-by: Padmanabh Ratnakar <padmanabh.ratnakar@avagotech.com>
Signed-off-by: Selvin Xavier <selvin.xavier@avagotech.com>
Signed-off-by: Devesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Devesh Sharma [Thu, 24 Dec 2015 18:14:06 +0000 (13:14 -0500)]
RDMA/ocrdma: Dispatch only port event when port state changes
Dispatch only port event to IB stack when port state changes.
Don't explicitly modify qps to error. Let application listen to
port events on async event queue or let QP fail with retry-exceeded
completion error.
Signed-off-by: Padmanabh Ratnakar <padmanabh.ratnakar@avagotech.com>
Signed-off-by: Devesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Devesh Sharma [Thu, 24 Dec 2015 18:14:05 +0000 (13:14 -0500)]
RDMA/ocrdma: Fix vlan-id assignment in qp parameters
vlan-id is wrongly getting as 0 when PFC is enabled.
Set vlan-id configured by user in QP parameters.
In case vlan interface is not used, flash a warning to
user to configure vlan and assign vlan-id as 0 in qp params.
Fixes:
dbf727de7440 ('IB/core: Use GID table in AH creation and dmac resolution')
Cc: Matan Barak <matanb@mellanox.com>
Signed-off-by: Devesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Matan Barak [Thu, 7 Jan 2016 09:19:29 +0000 (11:19 +0200)]
IB/cma: Fix RDMA port validation for iWarp
cma_validate_port wrongly assumed that Ethernet devices are RoCE
devices and thus their ndev should be matched in the GID table.
This broke the iWarp support. Fixing that matching the ndev only if
we work on a RoCE port.
Cc: <stable@vger.kernel.org> # 4.4.x-
Fixes:
abae1b71dd37 ('IB/cma: cma_validate_port should verify the port
and netdevice')
Reported-by: Hariprasad Shenai <hariprasad@chelsio.com>
Tested-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Mike Marciniszyn [Thu, 7 Jan 2016 21:44:10 +0000 (16:44 -0500)]
IB/qib: fix mcast detach when qp not attached
The code produces the following trace:
[
1750924.419007] general protection fault: 0000 [#3] SMP
[
1750924.420364] Modules linked in: nfnetlink autofs4 rpcsec_gss_krb5 nfsv4
dcdbas rfcomm bnep bluetooth nfsd auth_rpcgss nfs_acl dm_multipath nfs lockd
scsi_dh sunrpc fscache radeon ttm drm_kms_helper drm serio_raw parport_pc
ppdev i2c_algo_bit lpc_ich ipmi_si ib_mthca ib_qib dca lp parport ib_ipoib
mac_hid ib_cm i3000_edac ib_sa ib_uverbs edac_core ib_umad ib_mad ib_core
ib_addr tg3 ptp dm_mirror dm_region_hash dm_log psmouse pps_core
[
1750924.420364] CPU: 1 PID: 8401 Comm: python Tainted: G D
3.13.0-39-generic #66-Ubuntu
[
1750924.420364] Hardware name: Dell Computer Corporation PowerEdge
860/0XM089, BIOS A04 07/24/2007
[
1750924.420364] task:
ffff8800366a9800 ti:
ffff88007af1c000 task.ti:
ffff88007af1c000
[
1750924.420364] RIP: 0010:[<
ffffffffa0131d51>] [<
ffffffffa0131d51>]
qib_mcast_qp_free+0x11/0x50 [ib_qib]
[
1750924.420364] RSP: 0018:
ffff88007af1dd70 EFLAGS:
00010246
[
1750924.420364] RAX:
0000000000000001 RBX:
ffff88007b822688 RCX:
000000000000000f
[
1750924.420364] RDX:
ffff88007b822688 RSI:
ffff8800366c15a0 RDI:
6764697200000000
[
1750924.420364] RBP:
ffff88007af1dd78 R08:
0000000000000001 R09:
0000000000000000
[
1750924.420364] R10:
0000000000000011 R11:
0000000000000246 R12:
ffff88007baa1d98
[
1750924.420364] R13:
ffff88003ecab000 R14:
ffff88007b822660 R15:
0000000000000000
[
1750924.420364] FS:
00007ffff7fd8740(0000) GS:
ffff88007fc80000(0000)
knlGS:
0000000000000000
[
1750924.420364] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[
1750924.420364] CR2:
00007ffff597c750 CR3:
000000006860b000 CR4:
00000000000007e0
[
1750924.420364] Stack:
[
1750924.420364]
ffff88007b822688 ffff88007af1ddf0 ffffffffa0132429
000000007af1de20
[
1750924.420364]
ffff88007baa1dc8 ffff88007baa0000 ffff88007af1de70
ffffffffa00cb313
[
1750924.420364]
00007fffffffde88 0000000000000000 0000000000000008
ffff88003ecab000
[
1750924.420364] Call Trace:
[
1750924.420364] [<
ffffffffa0132429>] qib_multicast_detach+0x1e9/0x350
[ib_qib]
[
1750924.568035] [<
ffffffffa00cb313>] ? ib_uverbs_modify_qp+0x323/0x3d0
[ib_uverbs]
[
1750924.568035] [<
ffffffffa0092d61>] ib_detach_mcast+0x31/0x50 [ib_core]
[
1750924.568035] [<
ffffffffa00cc213>] ib_uverbs_detach_mcast+0x93/0x170
[ib_uverbs]
[
1750924.568035] [<
ffffffffa00c61f6>] ib_uverbs_write+0xc6/0x2c0 [ib_uverbs]
[
1750924.568035] [<
ffffffff81312e68>] ? apparmor_file_permission+0x18/0x20
[
1750924.568035] [<
ffffffff812d4cd3>] ? security_file_permission+0x23/0xa0
[
1750924.568035] [<
ffffffff811bd214>] vfs_write+0xb4/0x1f0
[
1750924.568035] [<
ffffffff811bdc49>] SyS_write+0x49/0xa0
[
1750924.568035] [<
ffffffff8172f7ed>] system_call_fastpath+0x1a/0x1f
[
1750924.568035] Code: 66 2e 0f 1f 84 00 00 00 00 00 31 c0 5d c3 66 2e 0f 1f
84 00 00 00 00 00 66 90 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb 48 8b 7f 10
<f0> ff 8f 40 01 00 00 74 0e 48 89 df e8 8e f8 06 e1 5b 5d c3 0f
[
1750924.568035] RIP [<
ffffffffa0131d51>] qib_mcast_qp_free+0x11/0x50
[ib_qib]
[
1750924.568035] RSP <
ffff88007af1dd70>
[
1750924.650439] ---[ end trace
73d5d4b3f8ad4851 ]
The fix is to note the qib_mcast_qp that was found. If none is found, then
return EINVAL indicating the error.
Cc: <stable@vger.kernel.org>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reported-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Erez Shitrit [Thu, 7 Jan 2016 07:28:08 +0000 (09:28 +0200)]
IB/IPoIB: Fix kernel panic on multicast flow
ipoib_mcast_restart_task calls ipoib_mcast_remove_list with the
parameter mcast->dev. That mcast is a temporary (used as an iterator)
variable that may be uninitialized.
There is no need to send the variable dev to the function, as each mcast
has its dev as a member in the mcast struct.
This causes the next panic:
RIP: 0010: ipoib_mcast_leave+0x6d/0xf0 [ib_ipoib]
RSP: 0018: EFLAGS:
00010246
RAX: f0201 RBX: 24e00 RCX: 00000
....
....
Stack:
Call Trace:
ipoib_mcast_remove_list+0x3a/0x70 [ib_ipoib]
ipoib_mcast_restart_task+0x3bb/0x520 [ib_ipoib]
process_one_work+0x164/0x470
worker_thread+0x11d/0x420
...
Fixes:
5a0e81f6f483 ('IB/IPoIB: factor out common multicast list removal code')
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Reported-by: Doron Tsur <doront@mellanox.com>
Reviewed-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Jenny Derzhavetz [Thu, 24 Dec 2015 10:20:48 +0000 (12:20 +0200)]
IB/iser: Support the remote invalidation exception
Declare that we support remote invalidation in case we are:
1. using fastreg method
2. always registering memory
Detect the invalidated rkey from the work completion info so we
won't invalidate it locally. The spec mandates that we must not rely
on the target remote invalidate our rkey so we must check it upon
a receive (scsi response) completion.
Signed-off-by: Jenny Derzhavetz <jennyf@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Sagi Grimberg [Wed, 9 Dec 2015 12:12:07 +0000 (14:12 +0200)]
IB/iser: Change the increment rkey flow logic
When we enable remote invalidate support we won't want to perform
local invalidates at the same time we do today, but we still need
to get new rkeys. So, decouple the rkey update from the local
invalidate and tie it to memory reg instead.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Jenny Derzhavetz <jennyf@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Jenny Derzhavetz [Wed, 9 Dec 2015 12:12:06 +0000 (14:12 +0200)]
IB/isert: Support the remote invalidation exception
We'll use remote invalidate, according to negotiation result
during connection establishment. If the initiator declared that
it supports the remote invalidate exception and the local HCA
supports IB_DEVICE_MEM_MGT_EXTENSIONS then the target will
use IB_WR_SEND_WITH_INV with the correct rkey for the response.
Signed-off-by: Jenny Derzhavetz <jennyf@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Jenny Derzhavetz [Wed, 9 Dec 2015 12:12:05 +0000 (14:12 +0200)]
IB/isert: Declare correct flags when accepting a connection
iser target does not support zero based virtual addresses and
send with invalidate, so it should declare that it doesn't.
Signed-off-by: Jenny Derzhavetz <jennyf@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Sagi Grimberg [Wed, 9 Dec 2015 12:12:04 +0000 (14:12 +0200)]
IB/isert: Remove unused file iser_proto.h
We don't need iser_proto.h anymore, remove it and
move (non-protocol) declarations to ib_isert.h
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Jenny Derzhavetz <jennyf@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Sagi Grimberg [Wed, 9 Dec 2015 12:12:03 +0000 (14:12 +0200)]
IB/iser,isert: Create and use new shared header
The iser RDMA_CM negotiation protocol is shared by
the initiator and the target, so have a shared header
for the defines and structure. Move relevant items from
the initiator and target headers.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Jenny Derzhavetz <jennyf@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Jenny Derzhavetz [Wed, 9 Dec 2015 12:12:02 +0000 (14:12 +0200)]
IB/iser: set intuitive values for mr_valid
This parameter is described as "is mr valid indicator".
In other words, it indicates whether memory registration
is valid or not. So intuitive values would be:
mr_valid=True, when memory registration is valid and
mr_valid=False otherwise.
Signed-off-by: Jenny Derzhavetz <jennyf@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Jenny Derzhavetz [Wed, 9 Dec 2015 12:12:01 +0000 (14:12 +0200)]
IB/iser: Don't register memory for all immediate data writes
When all the task data is sent as immediate data, we are
allowed to use the local_dma_lkey as it is not sent to
the wire.
Signed-off-by: Jenny Derzhavetz <jennyf@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Sagi Grimberg [Wed, 9 Dec 2015 12:12:00 +0000 (14:12 +0200)]
IB/iser: Reuse ib_sg_to_pages
We have in iser iser_sg_to_page_vec which has exactly
the same role as ib_sg_to_pages. Customize the page_vec
to hold a fake MR so we can reuse ib_sg_to_pages.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Roi Dayan [Wed, 9 Dec 2015 12:11:59 +0000 (14:11 +0200)]
IB/iser: Fix module init not cleaning up on error flow
Destroy workqueue on transport register error, also
release kmem cache on workqueue allocation error.
Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Julia Lawall [Sun, 29 Nov 2015 22:02:51 +0000 (23:02 +0100)]
IB/core: constify mmu_notifier_ops structures
This mmu_notifier_ops structure is never modified, so declare it as
const, like the other mmu_notifier_ops structures.
Done with the help of Coccinelle.
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Reviewed-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Julia Lawall [Sat, 28 Nov 2015 15:52:04 +0000 (16:52 +0100)]
IB/iser: constify iser_reg_ops structure
The iser_reg_ops structures are never modified, so declare them as const.
Done with the help of Coccinelle.
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Acked-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Julia Lawall [Sat, 28 Nov 2015 14:00:37 +0000 (15:00 +0100)]
RDMA/nes: constify nes_cm_ops structure
The nes_cm_ops structure is never modified, so declare it as const.
Done with the help of Coccinelle.
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Bodong Wang [Fri, 18 Dec 2015 11:53:20 +0000 (13:53 +0200)]
IB/mlx5: report tx/rx checksum cap in query results
This patch will report the tx/rx checksum cap for raw qp via the
query device results.
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Leon Romanovsky [Thu, 17 Dec 2015 07:31:53 +0000 (09:31 +0200)]
IB/mlx4: Convert kmalloc to kmalloc_array for checkpatch
Convert kmalloc to be kmalloc_array to fix warnings below:
WARNING: Prefer kmalloc_array over kmalloc with multiply
+ qp->sq.wrid = kmalloc(qp->sq.wqe_cnt * sizeof(u64),
WARNING: Prefer kmalloc_array over kmalloc with multiply
+ qp->rq.wrid = kmalloc(qp->rq.wqe_cnt * sizeof(u64),
WARNING: Prefer kmalloc_array over kmalloc with multiply
+ srq->wrid = kmalloc(srq->msrq.max * sizeof(u64),
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Leon Romanovsky [Thu, 17 Dec 2015 07:31:52 +0000 (09:31 +0200)]
IB/mlx4: Suppress non-fatal memory allocations
Failure in kmalloc memory allocations will throw a warning about it.
Such warnings are not needed anymore, since in commit
0ef2f05c7e02
("IB/mlx4: Use vmalloc for WR buffers when needed"), fallback mechanism
from kmalloc() to __vmalloc() was added.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Eran Ben Elisha [Mon, 14 Dec 2015 14:34:10 +0000 (16:34 +0200)]
IB/mlx5: Advertise atomic capabilities in query device
In order to ensure IB spec atomic correctness in atomic operations, if
HW is configured to host endianness, advertise IB_ATOMIC_HCA. if not,
advertise IB_ATOMIC_NONE.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Eran Ben Elisha [Mon, 14 Dec 2015 14:34:09 +0000 (16:34 +0200)]
net/mlx5_core: Add setting ATOMIC endian mode
HW is capable of 2 requestor endianness modes for standard 8 Bytes
atomic: BE (0x0) and host endianness (0x1). Read the supported modes
from hca atomic capabilities and configure HW to host endianness mode if
supported.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Hariprasad S [Fri, 11 Dec 2015 08:29:17 +0000 (13:59 +0530)]
iw_cxgb3: Fix incorrectly returning error on success
The cxgb3_*_send() functions return NET_XMIT_ values, which are
positive integers values. So don't treat positive return values
as an error.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Hariprasad S [Fri, 11 Dec 2015 07:32:01 +0000 (13:02 +0530)]
iw_cxgb4: Pass qid range to user space driver
Enhances the t4_dev_status_page to pass the qid start and size
attributes from iw_cxgb4 to libcxgb4.
Bump the ABI Version to 3 -> To allow libcxgb4 to detect old drivers and
revert to the old way of computing the qid ranges.
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Dean Luick [Thu, 10 Dec 2015 21:52:30 +0000 (16:52 -0500)]
IB/mad: Ensure fairness in ib_mad_completion_handler
It was found that when a process was rapidly sending MADs other processes could
be hung in their unregister calls.
This would happen when process A was injecting packets fast enough that the
single threaded workqueue was never exiting ib_mad_completion_handler.
Therefore when process B called flush_workqueue via the unregister call it
would hang until process A stopped sending MADs.
The fix is to periodically reschedule ib_mad_completion_handler after
processing a large number of completions. The number of completions chosen was
decided based on the defaults for the recv queue size. However, it was kept
fixed such that increasing those queue sizes would not adversely affect
fairness in the future.
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Leon Romanovsky [Sun, 20 Dec 2015 10:16:11 +0000 (12:16 +0200)]
IB/mlx5: Add driver cross-channel support
Add support of cross-channel functionality to mlx5
driver. This includes ability to ignore overrun for CQ
which intended for cross-channel, export device capability and
configure the QP to be sync master/slave queues.
The cross-channel enabled QP supports combination of
three possible properties:
* WQE processing on the receive queue of this QP
* WQE processing on the send queue of this QP
* WQE are supported on the send queue
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Leon Romanovsky [Sun, 20 Dec 2015 10:16:10 +0000 (12:16 +0200)]
IB/core: Add cross-channel support
The cross-channel feature allows to execute WQEs that involve
synchronization of I/O operations’ on different QPs.
This capability enables to program complex flows with a single
function call, hereby significantly reducing overhead associated
with I/O processing.
Cross-channel operations support is indicated by HCA capability
information.
The queue pairs can be configured to work as a “sync master queue”
or “sync slave queues”.
The added flags are:
1. Device capability flag IB_DEVICE_CROSS_CHANNEL for the
devices that can perform cross-channel operations.
2. CQ property flag IB_CQ_FLAGS_IGNORE_OVERRUN to disable CQ overrun
check. This check is useless in cross-channel scenario.
3. QP property flags to indicate if queues are slave or master:
* IB_QP_CREATE_MANAGED_SEND indicates that posted send work requests
will not be executed immediately and requires enabling.
* IB_QP_CREATE_MANAGED_RECV indicates that posted receive work
requests will not be executed immediately and requires enabling.
* IB_QP_CREATE_CROSS_CHANNEL declares the QP to work in cross-channel
mode. If IB_QP_CREATE_MANAGED_SEND and IB_QP_CREATE_MANAGED_RECV are
not provided, this QP will be sync master queue, else it will be sync
slave.
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Leon Romanovsky [Sun, 20 Dec 2015 10:16:09 +0000 (12:16 +0200)]
IB/core: Align coding style of ib_device_cap_flags structure
Modify enum ib_device_cap_flags such that other patches which add new
enum values pass strict checkpatch.pl checks.
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Matan Barak [Tue, 15 Dec 2015 18:30:13 +0000 (20:30 +0200)]
IB/mlx5: Mmap the HCA's core clock register to user-space
In order to read the HCA's current cycles register, we need
to map it to user-space. Add support to map this register
via mmap command.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Moshe Lazer <moshel@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Matan Barak [Tue, 15 Dec 2015 18:30:12 +0000 (20:30 +0200)]
IB/mlx5: Add hca_core_clock_offset to udata in init_ucontext
Pass hca_core_clock_offset to user-space is mandatory in order to
let the user-space read the free-running clock register from the
right offset in the memory mapped page.
Passing this value is done by changing the vendor's command
and response of init_ucontext to be in extensible form.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Moshe Lazer <moshel@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Matan Barak [Tue, 15 Dec 2015 18:30:11 +0000 (20:30 +0200)]
IB/mlx5: Add support for hca_core_clock and timestamp_mask
Reporting the hca_core_clock (in kHZ) and the timestamp_mask in
query_device extended verb. timestamp_mask is used by users in order
to know what is the valid range of the raw timestamps, while
hca_core_clock reports the clock frequency that is used for
timestamps.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Moshe Lazer <moshel@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Matan Barak [Tue, 15 Dec 2015 18:30:10 +0000 (20:30 +0200)]
IB/core: Add ib_is_udata_cleared
Extending core and vendor verb commands require us to check that the
unknown part of the user's given command is all zeros.
Adding ib_is_udata_cleared in order to do so.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Matan Barak [Tue, 15 Dec 2015 18:30:09 +0000 (20:30 +0200)]
IB/mlx5: Add create_cq extended command
In order to create a CQ that supports timestamp, mlx5 needs to
support the extended create CQ command with the timestamp flag.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Christoph Lameter [Mon, 21 Dec 2015 14:20:29 +0000 (08:20 -0600)]
IB/core: Display extended counter set if available
Check if the extended counters are available and if so
create the proper extended and additional counters.
Signed-off-by: Christoph Lameter <cl@linux.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Hal Rosenstock <hal@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Christoph Lameter [Mon, 21 Dec 2015 14:20:28 +0000 (08:20 -0600)]
IB/core: Specify attribute_id in port_table_attribute
Add the attr_id on port_table_attribute since we will have to add
a different port_table_attribute for the extended attribute soon.
Reviewed-by: Hal Rosenstock <hal@mellanox.com>
Signed-off-by: Christoph Lameter <cl@linux.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Christoph Lameter [Mon, 21 Dec 2015 14:20:27 +0000 (08:20 -0600)]
IB/core: Create get_perf_mad function in sysfs.c
Create a new function to retrieve performance management
data from the existing code in get_pma_counter().
Reviewed-by: Hal Rosenstock <hal@mellanox.com>
Signed-off-by: Christoph Lameter <cl@linux.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Or Gerlitz [Wed, 23 Dec 2015 16:30:58 +0000 (18:30 +0200)]
MAINTAINERS: Assign maintainer to Mellanox mlx4 core and IB drivers
The driver was written originally by Roland Dreier, currently there's
no official maintainer. Yishai steps in as maintainer.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Cc: Roland Dreier <roland@kernel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Or Gerlitz [Wed, 23 Dec 2015 16:30:57 +0000 (18:30 +0200)]
MAINTAINERS: Assign new maintainers to Mellanox mlx5 core and IB drivers
Matan and Leon step in as co-maintainers to replace Eli Cohen
who wrote and maintained the core and IB drivers.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Christoph Hellwig [Wed, 23 Dec 2015 18:12:54 +0000 (19:12 +0100)]
IB: remove the write-only usecnt field from struct ib_mr
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@sandisk.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Christoph Hellwig [Wed, 23 Dec 2015 18:12:53 +0000 (19:12 +0100)]
IB: remove the struct ib_phys_buf definition
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> [core]
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Christoph Hellwig [Wed, 23 Dec 2015 18:12:52 +0000 (19:12 +0100)]
ehca: stop using struct ib_phys_buf
And simplify the calling convention for full-memory registrations.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> [core]
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Christoph Hellwig [Wed, 23 Dec 2015 18:12:51 +0000 (19:12 +0100)]
amso1100: fold c2_reg_phys_mr into c2_get_dma_mr
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> [core]
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Christoph Hellwig [Wed, 23 Dec 2015 18:12:50 +0000 (19:12 +0100)]
nes: simplify nes_reg_phys_mr calling conventions
Just pass and address/size pair instead of an ib_phys_buf array.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> [core]
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Christoph Hellwig [Wed, 23 Dec 2015 18:12:49 +0000 (19:12 +0100)]
cxgb3: simplify iwch_get_dma_wr
Fold simplified versions of build_phys_page_list and
iwch_register_phys_mem into iwch_get_dma_wr now that no other callers
are left.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> [core]
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>