Bart Van Assche [Tue, 11 Aug 2015 00:08:44 +0000 (17:08 -0700)]
IB/srp: Introduce srp_device.use_fmr
Introduce the variable srp_device.use_fmr. Leave out the dev->has_fr /
dev->has_fmr and ch->fr_pool / ch->fmr_pool checks since these are
redundant. This patch does not change any functionality but makes the
source code easier to read.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Bart Van Assche [Tue, 11 Aug 2015 00:08:18 +0000 (17:08 -0700)]
IB/srp: Remove use_mr argument from srp_map_sg_entry()
Move the srp_map_desc() call from inside srp_map_sg_entry() to
srp_map_sg() such that the use_mr argument can be removed from
srp_map_sg_entry().
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Bart Van Assche [Tue, 11 Aug 2015 00:07:46 +0000 (17:07 -0700)]
IB/srp: Remove the memory registration backtracking code
Mapping a discontiguous sg-list requires multiple memory regions
and hence can exhaust the memory region pool. The SRP initiator
already handles this by temporarily reducing the queue depth. This
means that it is safe to remove the memory registration backtracking
code. This patch has been tested with direct I/O sizes up to 256 MB.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Bart Van Assche [Tue, 11 Aug 2015 00:07:27 +0000 (17:07 -0700)]
IB/srp: Add memory descriptor array pointer range checking
Although most paths through which a request is submitted check
block layer parameters like the max_segments limit, these are
not checked when an SG_IO or direct I/O request is submitted.
Hence add a range check for the memory descriptor array pointer.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Bart Van Assche [Tue, 11 Aug 2015 00:06:57 +0000 (17:06 -0700)]
IB/srp: Use multiple registrations for large memory regions
Instead of using the global rkey for large memory regions, use
multiple registrations. See also the while (dma_len) loop further
down in srp_map_sg_entry().
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Bart Van Assche [Tue, 11 Aug 2015 00:06:29 +0000 (17:06 -0700)]
IB/srp: Re-enable FMR for non-page aligned buffers
During a discussion in 2011 nobody recalled why FMR was not used for
non-page aligned buffers (see also
http://thread.gmane.org/gmane.linux.drivers.rdma/7149). Re-enable FMR
for such buffers. For the reason why the srp_map_fmr() function needs
to be modified, see also patch "IB/srp: rework mapping engine to use
multiple FMR entries" (commit ID
8f26c9ff9cd0; January 2011).
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Jason Gunthorpe [Thu, 30 Jul 2015 23:22:26 +0000 (17:22 -0600)]
rds/ib: Remove ib_get_dma_mr calls
The pd now has a local_dma_lkey member which completely replaces
ib_get_dma_mr, use it instead.
Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Jason Gunthorpe [Thu, 30 Jul 2015 23:22:25 +0000 (17:22 -0600)]
net/9p: Remove ib_get_dma_mr calls
The pd now has a local_dma_lkey member which completely replaces
ib_get_dma_mr, use it instead.
Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Tested-by: Dominique Martinet <dominique.martinet@cea.fr>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Jason Gunthorpe [Thu, 30 Jul 2015 23:22:24 +0000 (17:22 -0600)]
ib_srpt: Remove ib_get_dma_mr calls
The pd now has a local_dma_lkey member which completely replaces
ib_get_dma_mr, use it instead.
Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Jason Gunthorpe [Thu, 30 Jul 2015 23:22:22 +0000 (17:22 -0600)]
IB/srp: Use pd->local_dma_lkey
Replace all leys with pd->local_dma_lkey. This driver does not support
iWarp, so this is safe.
The insecure use of ib_get_dma_mr is thus isolated to an rkey, and will
have to be fixed separately.
Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Jason Gunthorpe [Thu, 30 Jul 2015 23:22:21 +0000 (17:22 -0600)]
iser-target: Remove ib_get_dma_mr calls
The pd now has a local_dma_lkey member which completely replaces
ib_get_dma_mr, use it instead.
Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Jason Gunthorpe [Thu, 30 Jul 2015 23:22:20 +0000 (17:22 -0600)]
IB/iser: Use pd->local_dma_lkey
Replace all leys with pd->local_dma_lkey. This driver does not support
iWarp, so this is safe.
The insecure use of ib_get_dma_mr is thus isolated to an rkey, and this
looks trivially fixed by forcing the use of registration in a future
patch.
Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Jason Gunthorpe [Thu, 30 Jul 2015 23:22:19 +0000 (17:22 -0600)]
IB/mlx5: Remove ib_get_dma_mr calls
The pd now has a local_dma_lkey member which completely replaces
ib_get_dma_mr, use it instead.
Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Jason Gunthorpe [Thu, 30 Jul 2015 23:22:18 +0000 (17:22 -0600)]
IB/mlx4: Remove ib_get_dma_mr calls
The pd now has a local_dma_lkey member which completely replaces
ib_get_dma_mr, use it instead.
Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Jason Gunthorpe [Thu, 30 Jul 2015 23:22:17 +0000 (17:22 -0600)]
IB/ipoib: Remove ib_get_dma_mr calls
The pd now has a local_dma_lkey member which completely replaces
ib_get_dma_mr, use it instead.
Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Jason Gunthorpe [Thu, 30 Jul 2015 23:22:16 +0000 (17:22 -0600)]
IB/mad: Remove ib_get_dma_mr calls
The pd now has a local_dma_lkey member which completely replaces
ib_get_dma_mr, use it instead.
Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Jason Gunthorpe [Wed, 5 Aug 2015 20:14:45 +0000 (14:14 -0600)]
IB/core: Guarantee that a local_dma_lkey is available
Every single ULP requires a local_dma_lkey to do anything with
a QP, so let us ensure one exists for every PD created.
If the driver can supply a global local_dma_lkey then use that, otherwise
ask the driver to create a local use all physical memory MR associated
with the new PD.
Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Reviewed-by: Sagi Grimberg <sagig@dev.mellanox.co.il>
Acked-by: Christoph Hellwig <hch@infradead.org>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Tested-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Sagi Grimberg [Thu, 6 Aug 2015 15:33:06 +0000 (18:33 +0300)]
IB/iser: Chain all iser transaction send work requests
Chaning of send work requests benefits performance by
reducing the send queue lock contention (acquired in
ib_post_send) and saves us HW doorbells which is posted
only once.
Currently, in normal IO flows iser does not chain the CDB send
work request with the registration work request. Also in PI
flows, signature work requests are not chained as well.
Lets chain those and post only once.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Sagi Grimberg [Thu, 6 Aug 2015 15:33:05 +0000 (18:33 +0300)]
IB/iser: Add debug prints to the various memory registration methods
Easier to debug when we have the registration details.
This patch does not change any functionality.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Adir Lev <adirl@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Sagi Grimberg [Thu, 6 Aug 2015 15:33:04 +0000 (18:33 +0300)]
IB/iser: Support up to 8MB data transfer in a single command
iser support up to 512KB data transfer in a single scsi command.
This means that larger IOs will split to different request. While
iser can easily saturate FDR/EDR wires, some arrays are fine tuned
for 1MB (or larger) IO sizes, hence add an option to support larger
transfers (up to 8MB) if the device allows it.
Given that a few target implementations don't support data transfers
of more than 512KB by default and the fact that larger IO sizes require
more resources, we introduce a module parameter to determine the
maximum number of 512B sectors in a single scsi command.
Users that are interested in larger transfers can change this value given
that the target supports larger transfers.
At the moment, iser works in 4K pages granularity, In a later stage
we will get it to work with system page size instead.
IO operations that consists of N pages will need a page vector
of size N+1 in case the first SG element contains an offset. Given
that some devices allocates memory regions in powers of 2, this
means that allocating a region with N+1 pages, will result in
region resources allocation of the next power of 2. Since we don't
want that to happen, in case we are in the limit of IO size supported
and the first SG element has an offset, we align the SG list using a
bounce buffer (which is OK given that this is not likely to happen a lot).
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Sagi Grimberg [Thu, 6 Aug 2015 15:33:03 +0000 (18:33 +0300)]
IB/iser: Pass registration pool a size parameter
Hard coded for now. This will allow to allocate different
sized MRs depending on the IO size needed (and device
capabilities).
This patch does not change any functionality.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Sagi Grimberg [Thu, 6 Aug 2015 15:33:02 +0000 (18:33 +0300)]
IB/iser: Unify fast memory registration flows
iser_reg_rdma_mem_[fastreg|fmr] share a lot of code, and
logically do the same thing other than the buffer registration
method itself (iser_fast_reg_mr vs. iser_fast_reg_fmr).
The DIF logic is not implemented in the FMR flow as there is no
existing device that supports FMRs and Signature feature.
This patch unifies the flow in a single routine iser_reg_rdma_mem
and just split to fmr/frwr for the buffer registration itself.
Also, for symmetry reasons, unify iser_unreg_rdma_mem (which will
call the relevant device specific unreg routine).
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Adir Lev <adirl@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Sagi Grimberg [Thu, 6 Aug 2015 15:33:01 +0000 (18:33 +0300)]
IB/iser: Make reg_desc_get a per device routine
As for fmrs we will hold a single registration descriptor
as no need for multiple like in the frwr mode (descriptor
for each task). This change helps unifying the duplicate
registration code paths.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Adir Lev <adirl@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Sagi Grimberg [Thu, 6 Aug 2015 15:33:00 +0000 (18:33 +0300)]
IB/iser: Rename iser_reg_page_vec to iser_fast_reg_fmr
Also, change a name of a local variable.
This patch does not change any functionality.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Adir Lev <adirl@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Adir Lev [Thu, 6 Aug 2015 15:32:59 +0000 (18:32 +0300)]
IB/iser: Maintain connection fmr_pool under a single registration descriptor
This will allow us to unify the memory registration code path between
the various methods which vary by the device capabilities. This change
will make it easier and less intrusive to remove fmr_pools from the
code when we'd want to.
The reason we use a single descriptor is to avoid taking a
redundant spinlock when working with FMRs.
We also change the signature of iser_reg_page_vec to make it match
iser_fast_reg_mr (and the future indirect registration method).
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Adir Lev <adirl@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Sagi Grimberg [Thu, 6 Aug 2015 15:32:58 +0000 (18:32 +0300)]
IB/iser: Introduce iser registration pool struct
Instead of having it a part of the connection structure,
have it be under a dedicated (embedded) structure in the
connection. A logical separation of the registration pool
and the connection structure.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Adir Lev <adirl@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Sagi Grimberg [Thu, 6 Aug 2015 15:32:57 +0000 (18:32 +0300)]
IB/iser: Move fastreg descriptor allocation to iser_create_fastreg_desc
Don't have the caller allocate the structure and worry about
freeing it in case the routine failed.
This patch does not change any functionality.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Adir Lev <adirl@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Sagi Grimberg [Thu, 6 Aug 2015 15:32:56 +0000 (18:32 +0300)]
IB/iser: Introduce iser_reg_ops
Move all the per-device function pointers to an easy
extensible iser_reg_ops structure that contains all
the iser registration operations.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Sagi Grimberg [Thu, 6 Aug 2015 15:32:55 +0000 (18:32 +0300)]
IB/iser: Remove dead code in fmr_pool alloc/free
In the past the we always tried to allocate an fmr_pool
and if it failed on ENOSYS (not supported) then we continued
with dma mr. This is not the case anymore and if we tried to
allocate an fmr_pool then it is supported and we expect to succeed.
Also, the check if fmr_pool is allocated when free is called is
redundant as well as we are guaranteed it exists.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Sagi Grimberg [Thu, 6 Aug 2015 15:32:54 +0000 (18:32 +0300)]
IB/iser: Rename struct fast_reg_descriptor -> iser_fr_desc
Avoid struct names without iser_ prefix.
This patch does not change any functionality.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Sagi Grimberg [Thu, 6 Aug 2015 15:32:53 +0000 (18:32 +0300)]
IB/iser: Introduce struct iser_reg_resources
Have fast_reg_descriptor hold struct iser_reg_resources
(mr, frpl, valid flag). This will be useful when the
actual buffer registration routines will be passed with
the needed registration resources (i.e. iser_reg_resources)
without being aware of their nature (i.e. data or protection).
In order to achieve this, we remove reg_indicators flags container
and place specific flags (mr_valid) within iser_reg_resources struct.
We also place the sig_mr_valid and sig_protcted flags in iser_pi_context.
This patch also modifies iser_fast_reg_mr to receive the
reg_resources instead of the fast_reg_descriptor and a data/protection
indicator.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Adir Lev <adirl@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Sagi Grimberg [Thu, 6 Aug 2015 15:32:52 +0000 (18:32 +0300)]
IB/iser: Remove an unneeded print for unaligned memory
We can do it in iser_aligned_data_len instead and
it will save us an argument that is passed to
fall_to_counce_buf just for the print.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Sagi Grimberg [Thu, 6 Aug 2015 15:32:51 +0000 (18:32 +0300)]
IB/iser: Remove a redundant always-false condition
We always call iser_initialize_task_headers() and set
the header tx_sg.lkey to the device mr lkey, so no
point in checking it in iser_create_send_desc().
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Sagi Grimberg [Thu, 6 Aug 2015 15:32:50 +0000 (18:32 +0300)]
IB/iser: Fix possible bogus DMA unmapping
If iser_initialize_task_headers() routine failed before
dma mapping, we should not attempt to unmap in cleanup_task().
Fixes:
7414dde0a6c3a958e (IB/iser: Fix race between iser connection ...)
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Sagi Grimberg [Thu, 6 Aug 2015 15:32:49 +0000 (18:32 +0300)]
IB/iser: Get rid of un-maintained counters
We don't update those anywhere in the code and they
seem pretty useless (no one seem to care about those).
qp_tx_queue_full: We never should get this
fmr_map_not_avail: We can never get to this
eh_abort_cnt: We don't monitor aborts
Go ahead and remove them.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Sagi Grimberg [Thu, 6 Aug 2015 15:32:48 +0000 (18:32 +0300)]
IB/iser: Fix missing return status check in iser_send_data_out
Since commit "IB/iser: Fix race between iser connection teardown..."
iser_initialize_task_headers() might fail, so we need to check that.
Fixes:
7414dde0a6c3a958e (IB/iser: Fix race between iser connection ...)
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Sagi Grimberg [Thu, 6 Aug 2015 15:32:47 +0000 (18:32 +0300)]
IB/iser: Remove '.' from log message
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Sagi Grimberg [Thu, 6 Aug 2015 15:32:46 +0000 (18:32 +0300)]
IB/iser: Change minor assignments and logging prints
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Jenny Falkovich [Thu, 6 Aug 2015 15:32:45 +0000 (18:32 +0300)]
IB/iser: Change some module parameters to be RO
While we're at it, use permission defines instead
of octal values and rearrange a little bit.
Signed-off-by: Jenny Derzhavetz <jennyf@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Kaike Wan [Fri, 14 Aug 2015 12:52:09 +0000 (08:52 -0400)]
IB/sa: Route SA pathrecord query through netlink
This patch routes a SA pathrecord query to netlink first and processes the
response appropriately. If a failure is returned, the request will be sent
through IB. The decision whether to route the request to netlink first is
determined by the presence of a listener for the local service netlink
multicast group. If the user-space local service netlink multicast group
listener is not present, the request will be sent through IB, just like
what is currently being done.
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: John Fleck <john.fleck@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Kaike Wan [Fri, 14 Aug 2015 12:52:08 +0000 (08:52 -0400)]
IB/sa: Allocate SA query with kzalloc
Replace kmalloc with kzalloc so that all uninitialized fields in SA query
will be zero-ed out to avoid unintentional consequence. This prepares the
SA query structure to accept new fields in the future.
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: John Fleck <john.fleck@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Kaike Wan [Fri, 14 Aug 2015 12:52:07 +0000 (08:52 -0400)]
IB/core: Add rdma netlink helper functions
This patch adds a function to check if listeners for a netlink multicast
group are present. It also adds a function to receive netlink response
messages.
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: John Fleck <john.fleck@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Kaike Wan [Fri, 14 Aug 2015 12:52:06 +0000 (08:52 -0400)]
IB/netlink: Add defines for local service requests through netlink
This patch adds netlink defines for local service client, local service
group, local service operations, and related attributes.
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: John Fleck <john.fleck@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Bart Van Assche [Fri, 14 Aug 2015 18:01:09 +0000 (11:01 -0700)]
IB/srp: Stop the scsi_eh_<n> and scsi_tmf_<n> threads if login fails
scsi_host_alloc() not only allocates memory for a SCSI host but also
creates the scsi_eh_<n> kernel thread and the scsi_tmf_<n> workqueue.
Stop these threads if login fails by calling scsi_host_put().
Reported-by: Konstantin Krotov <kkv@clodo.ru>
Fixes:
fb49c8bbaae7 ("Remove an extraneous scsi_host_put() from an error path")
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Sagi Grimberg <sagig@mellanox.com>
Cc: Sebastian Parschauer <sebastian.riemer@profitbricks.com>
Cc: <stable@vger.kernel.org> #v3.19
Signed-off-by: Doug Ledford <dledford@redhat.com>
Bart Van Assche [Fri, 31 Jul 2015 21:13:52 +0000 (14:13 -0700)]
IB/srp: Bump driver version and release date
Since version 1.0 e.g. scsi-mq has been added. Since this is
a significant change, bump the driver version and release date.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Sagi Grimberg <sagig@mellanox.com>
Cc: Sebastian Parschauer <sebastian.riemer@profitbricks.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Bart Van Assche [Fri, 31 Jul 2015 21:13:22 +0000 (14:13 -0700)]
IB/srp: Handle partial connection success correctly
Avoid that the following kernel warning is reported if the SRP
target system accepts fewer channels per connection than what
was requested by the initiator system:
WARNING: at drivers/infiniband/ulp/srp/ib_srp.c:617 srp_destroy_qp+0xb1/0x120 [ib_srp]()
Call Trace:
[<
ffffffff8105d67f>] warn_slowpath_common+0x7f/0xc0
[<
ffffffff8105d6da>] warn_slowpath_null+0x1a/0x20
[<
ffffffffa05419e1>] srp_destroy_qp+0xb1/0x120 [ib_srp]
[<
ffffffffa05445fb>] srp_create_ch_ib+0x19b/0x420 [ib_srp]
[<
ffffffffa0545257>] srp_create_target+0x7d7/0xa94 [ib_srp]
[<
ffffffff8138dac0>] dev_attr_store+0x20/0x30
[<
ffffffff812079ef>] sysfs_write_file+0xef/0x170
[<
ffffffff81191fc4>] vfs_write+0xb4/0x130
[<
ffffffff8119276f>] sys_write+0x5f/0xa0
[<
ffffffff815a0a59>] system_call_fastpath+0x16/0x1b
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Sagi Grimberg <sagig@mellanox.com>
Cc: Sebastian Parschauer <sebastian.riemer@profitbricks.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: stable@vger.kernel.org
Signed-off-by: Doug Ledford <dledford@redhat.com>
Bart Van Assche [Fri, 31 Jul 2015 21:12:48 +0000 (14:12 -0700)]
IB/srp: Constify a function argument
This patch does not change any functionality.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Sagi Grimberg <sagig@mellanox.com>
Cc: Sebastian Parschauer <sebastian.riemer@profitbricks.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Ariel Nahum [Sun, 9 Aug 2015 08:16:27 +0000 (11:16 +0300)]
IB/mlx4: Fix incorrect cq flushing in error state
When handling a device internal error, the driver is responsible to
drain the completion queue with flush errors.
In case a completion queue was assigned to multiple send queues, the
driver iterates over the send queues and generates flush errors of
inflight wqes. The driver must correctly pass the wc array with an
offset as a result of the previous send queue iteration. Not doing so
will overwrite previously set completions and return a wrong number
of polled completions which includes ones which were not correctly set.
Fixes:
35f05dabf95a (IB/mlx4: Reset flow support for IB kernel ULPs)
Signed-off-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Cc: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Noa Osherovich [Thu, 30 Jul 2015 14:34:24 +0000 (17:34 +0300)]
IB/mlx4: Use correct SL on AH query under RoCE
The mlx4 IB driver implementation for ib_query_ah used a wrong offset
(28 instead of 29) when link type is Ethernet. Fixed to use the correct one.
Fixes:
fa417f7b520e ('IB/mlx4: Add support for IBoE')
Signed-off-by: Shani Michaeli <shanim@mellanox.com>
Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Jack Morgenstein [Thu, 30 Jul 2015 14:34:23 +0000 (17:34 +0300)]
IB/mlx4: Forbid using sysfs to change RoCE pkeys
The pkey mapping for RoCE must remain the default mapping:
VFs:
virtual index 0 = mapped to real index 0 (0xFFFF)
All others indices: mapped to a real pkey index containing an
invalid pkey.
PF:
virtual index i = real index i.
Don't allow users to change these mappings using files found in
sysfs.
Fixes:
c1e7e466120b ('IB/mlx4: Add iov directory in sysfs under the ib device')
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Jack Morgenstein [Thu, 30 Jul 2015 14:34:22 +0000 (17:34 +0300)]
IB/mlx4: Demote mcg message from warning to debug
The mcg "too many pending requests" warning message fills the log
when OpenSM is downed. Demote the message from warning level to
debug level.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Jack Morgenstein [Thu, 30 Jul 2015 14:34:21 +0000 (17:34 +0300)]
IB/mlx4: Fix potential deadlock when sending mad to wire
send_mad_to_wire takes the same spinlock that is taken in
the interrupt context. Therefore, it needs irqsave/restore.
Fixes:
b9c5d6a64358 ('IB/mlx4: Add multicast group (MCG) paravirtualization for SR-IOV')
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Doug Ledford [Sat, 15 Aug 2015 14:16:14 +0000 (10:16 -0400)]
IB/core: Remove needless bracketization
Signed-off-by: Doug Ledford <dledford@redhat.com>
Somnath Kotur [Thu, 30 Jul 2015 15:33:31 +0000 (18:33 +0300)]
RDMA/ocrdma: Incorporate the moving of GID Table mgmt to IB/Core
1.Change query_gid hook to return value from IB/Core GID
management APIs.
2.Get rid of all the netdev notifier chain subscription code as well
as maintenance of SGID Table in memory.
3.Implement get_netdev hook in driver.
Signed-off-by: Somnath Kotur <somnath.kotur@avagotech.com>
Signed-off-by: Devesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Moni Shoua [Thu, 30 Jul 2015 15:33:30 +0000 (18:33 +0300)]
IB/mlx4: Replace mechanism for RoCE GID management
Manage RoCE gid table with logic in IB/core, which is common to all
vendors, and remove the mechanism from the mlx4 IB driver.
Since management of the GID cache may lead to index mismatch with the
hardware GID table, a translation between indexes is required when
modifying a QP or creating an address handle.
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Moni Shoua [Thu, 30 Jul 2015 15:33:29 +0000 (18:33 +0300)]
IB/mlx4: Implement ib_device callbacks
get_netdev: get the net_device on the physical port of the IB transport port. In
port aggregation mode it is required to return the netdev of the active port.
modify_gid: note for a change in the RoCE gid cache. Handle this by writing to
the harsware GID table. It is possible that indexes in cahce and hardware tables
won't match so a translation is required when modifying a QP or creating an
address handle.
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Moni Shoua [Thu, 30 Jul 2015 15:33:28 +0000 (18:33 +0300)]
net/mlx4: Postpone the registration of net_device
The mlx4 network driver was registered in the context of the 'add'
function of the core driver (called when HW should be registered).
This makes the netdev event NETDEV_REGISTER to be sent in a context
where the answer to get_protocol_dev() callback returns NULL. This may
be confusing to listeners of netdev events.
This patch is a preparation to the patch that implements the
get_netdev() callback in the IB/mlx4 driver.
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Matan Barak [Thu, 30 Jul 2015 15:33:27 +0000 (18:33 +0300)]
IB/core: Add RoCE table bonding support
Handling bonding and other devices require us to all all GIDs of the
net-devices which are upper-devices of the RoCE port related
net-device.
Active-backup configurations imposes even more challenges as the
default GID should only be set on the active devices (this is
necessary as otherwise the same MAC could be used for several
slaves and thus several slaves will have identical GIDs).
Managing these configurations are done by listening to:
(a) NETDEV_CHANGEUPPER event
(1) if a related net-device is linked, delete all inactive
slaves default GIDs and add the upper device GIDs.
(2) if a related net-device is unlinked, delete all upper GIDs
and add the default GIDs.
(b) NETDEV_BONDING_FAILOVER:
(1) delete the bond GIDs from inactive slaves
(2) delete the inactive slave's default GIDs
(3) Add the bond GIDs to the active slave.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Dan Carpenter [Tue, 18 Aug 2015 09:22:10 +0000 (12:22 +0300)]
IB/core: missing curly braces in ib_find_gid()
Smatch says that, based on the indenting, we should probably add curly
braces here.
Fixes:
03db3a2d81e6 ('IB/core: Add RoCE GID table management')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Matan Barak [Thu, 30 Jul 2015 15:33:26 +0000 (18:33 +0300)]
IB/core: Add RoCE GID table management
RoCE GIDs are based on IP addresses configured on Ethernet net-devices
which relate to the RDMA (RoCE) device port.
Currently, each of the low-level drivers that support RoCE (ocrdma,
mlx4) manages its own RoCE port GID table. As there's nothing which is
essentially vendor specific, we generalize that, and enhance the RDMA
core GID cache to do this job.
In order to populate the GID table, we listen for events:
(a) netdev up/down/change_addr events - if a netdev is built onto
our RoCE device, we need to add/delete its IPs. This involves
adding all GIDs related to this ndev, add default GIDs, etc.
(b) inet events - add new GIDs (according to the IP addresses)
to the table.
For programming the port RoCE GID table, providers must implement
the add_gid and del_gid callbacks.
RoCE GID management requires us to state the associated net_device
alongside the GID. This information is necessary in order to manage
the GID table. For example, when a net_device is removed, its
associated GIDs need to be removed as well.
RoCE mandates generating a default GID for each port, based on the
related net-device's IPv6 link local. In contrast to the GID based on
the regular IPv6 link-local (as we generate GID per IP address),
the default GID is also available when the net device is down (in
order to support loopback).
Locking is done as follows:
The patch modify the GID table code both for new RoCE drivers
implementing the add_gid/del_gid callbacks and for current RoCE and
IB drivers that do not. The flows for updating the table are
different, so the locking requirements are too.
While updating RoCE GID table, protection against multiple writers is
achieved via mutex_lock(&table->lock). Since writing to a table
requires us to find an entry (possible a free entry) in the table and
then modify it, this mutex protects both the find_gid and write_gid
ensuring the atomicity of the action.
Each entry in the GID cache is protected by rwlock. In RoCE, writing
(usually results from netdev notifier) involves invoking the vendor's
add_gid and del_gid callbacks, which could sleep.
Therefore, an invalid flag is added for each entry. Updates for RoCE are
done via a workqueue, thus sleeping is permitted.
In IB, updates are done in write_lock_irq(&device->cache.lock), thus
write_gid isn't allowed to sleep and add_gid/del_gid are not called.
When passing net-device into/out-of the GID cache, the device
is always passed held (dev_hold).
The code uses a single work item for updating all RDMA devices,
following a netdev or inet notifier.
The patch moves the cache from being a client (which was incorrect,
as the cache is part of the IB infrastructure) to being explicitly
initialized/freed when a device is registered/removed.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Jason Gunthorpe [Tue, 4 Aug 2015 21:23:34 +0000 (15:23 -0600)]
IB/core: Make ib_alloc_device init the kobject
This gets rid of the weird in-between state where struct ib_device
was allocated but the kobject didn't work.
Consequently ib_device_release is now guaranteed to be called in
all situations and we needn't duplicate its kfrees on error paths.
Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Matan Barak [Thu, 30 Jul 2015 15:33:24 +0000 (18:33 +0300)]
net/bonding: Export bond_option_active_slave_get_rcu
Some consumers of the netdev events API would like to know who is the
active slave when a NETDEV_CHANGEUPPER or NETDEV_BONDING_FAILOVER
events occur. For example, when managing RoCE GIDs, GIDs based on the
bond's ips should only be set on the port which corresponds to active
slave netdevice.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Matan Barak [Thu, 30 Jul 2015 15:33:23 +0000 (18:33 +0300)]
net: Add info for NETDEV_CHANGEUPPER event
Some consumers of NETDEV_CHANGEUPPER event would like to know which
upper device was linked/unlinked and what operation was carried.
Add information in the notifier info block for that purpose.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Matan Barak [Thu, 30 Jul 2015 15:33:22 +0000 (18:33 +0300)]
net/ipv6: Export addrconf_ifid_eui48
For loopback purposes, RoCE devices should have a default GID in the
port GID table, even when the interface is down. In order to do so,
we use the IPv6 link local address which would have been genenrated
for the related Ethernet netdevice when it goes up as a default GID.
addrconf_ifid_eui48 is used to gernerate this address, export it.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Sagi Grimberg [Thu, 30 Jul 2015 07:32:48 +0000 (10:32 +0300)]
IB/core: Drop ib_alloc_fast_reg_mr
Fully replaced by a more generic and suitable
ib_alloc_mr.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Mike Marciniszyn [Fri, 7 Aug 2015 14:51:25 +0000 (10:51 -0400)]
IB/hfi1: Support ib_alloc_mr verb
Ported from upstream qib commit
68c02e232b8a ("qib: Support ib_alloc_mr verb")
Tested-by: Jubin John <jubin.john@intel.com>
Reviewed-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Sagi Grimberg [Thu, 30 Jul 2015 07:32:47 +0000 (10:32 +0300)]
qib: Support ib_alloc_mr verb
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Sagi Grimberg [Thu, 30 Jul 2015 07:32:46 +0000 (10:32 +0300)]
nes: Support ib_alloc_mr verb
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Sagi Grimberg [Thu, 30 Jul 2015 07:32:45 +0000 (10:32 +0300)]
cxgb3: Support ib_alloc_mr verb
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Sagi Grimberg [Thu, 30 Jul 2015 07:32:44 +0000 (10:32 +0300)]
iw_cxgb4: Support ib_alloc_mr verb
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Sagi Grimberg [Thu, 30 Jul 2015 07:32:43 +0000 (10:32 +0300)]
ocrdma: Support ib_alloc_mr verb
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Sagi Grimberg [Thu, 30 Jul 2015 07:32:42 +0000 (10:32 +0300)]
mlx4: Support ib_alloc_mr verb
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Sagi Grimberg [Thu, 30 Jul 2015 07:32:41 +0000 (10:32 +0300)]
mlx5: Drop mlx5_ib_alloc_fast_reg_mr
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Sagi Grimberg [Thu, 30 Jul 2015 07:32:40 +0000 (10:32 +0300)]
RDS: Convert to ib_alloc_mr
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Steve Wise [Fri, 7 Aug 2015 16:11:20 +0000 (11:11 -0500)]
svcrdma: limit FRMR page list lengths to device max
Svcrdma was incorrectly allocating fastreg MRs and page lists using
RPCSVC_MAXPAGES, which can exceed the device capabilities. So limit
the depth to the minimum of RPCSVC_MAXPAGES and xprt->sc_frmr_pg_list_len.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Sagi Grimberg [Thu, 30 Jul 2015 07:32:39 +0000 (10:32 +0300)]
xprtrdma, svcrdma: Convert to ib_alloc_mr
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Sagi Grimberg [Thu, 30 Jul 2015 07:32:38 +0000 (10:32 +0300)]
IB/srp: Convert to ib_alloc_mr
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Sagi Grimberg [Thu, 30 Jul 2015 07:32:37 +0000 (10:32 +0300)]
iser-target: Convert to ib_alloc_mr
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Sagi Grimberg [Thu, 30 Jul 2015 07:32:36 +0000 (10:32 +0300)]
IB/iser: Convert to ib_alloc_mr
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Sagi Grimberg [Thu, 30 Jul 2015 07:32:35 +0000 (10:32 +0300)]
IB: Modify ib_create_mr API
Use ib_alloc_mr with specific parameters.
Change the existing callers.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Sagi Grimberg [Thu, 30 Jul 2015 07:32:34 +0000 (10:32 +0300)]
IB/core: Get rid of redundant verb ib_destroy_mr
This was added in a thought of uniting all mr allocation
and deallocation routines but the fact is we have a single
deallocation routine already, ib_dereg_mr.
And, move mlx5_ib_destroy_mr specific logic into mlx5_ib_dereg_mr
(includes only signature stuff for now).
And, fixup the only callers (iser/isert) accordingly.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Haggai Eran [Thu, 27 Aug 2015 12:55:15 +0000 (15:55 +0300)]
IB/cma: Fix net_dev reference leak with failed requests
When no matching listening ID is found for a given request, the net_dev
that was used to find the request isn't released.
Fixes:
0b3ca768fcb0 ("IB/cma: Use found net_dev for passive connections")
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Haggai Eran [Thu, 30 Jul 2015 14:50:26 +0000 (17:50 +0300)]
IB/cm: Remove compare_data checks
Now that there are no ib_cm clients using the compare_data feature for
matching IB CM requests' private data, remove the compare_data parameter of
ib_cm_listen and remove the code implementing the feature.
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Haggai Eran [Thu, 30 Jul 2015 14:50:25 +0000 (17:50 +0300)]
IB/cma: Share ib_cm_ids between rdma_cm_ids
Use ib_cm_insert_listen to create listening IB CM IDs or share existing
ones if needed. When given a request on a specific CM ID, the code now
matches the request to the RDMA CM ID based on the request parameters, so
it no longer needs to rely on the ib_cm's private data matching
capabilities.
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Haggai Eran [Thu, 30 Jul 2015 14:50:24 +0000 (17:50 +0300)]
IB/cma: Use found net_dev for passive connections
When receiving a new connection in cma_req_handler, we actually already
know the net_dev that is used for the connection's creation. Instead of
calling cma_translate_addr to resolve the new connection id's source
address, just use the net_dev that was found.
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Haggai Eran [Thu, 30 Jul 2015 14:50:23 +0000 (17:50 +0300)]
IB/cma: Validate routing of incoming requests
Pass incoming request parameters through the relevant IPv4/IPv6 routing
tables and make sure the network stack is configured to handle such
requests.
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Haggai Eran [Thu, 30 Jul 2015 14:50:22 +0000 (17:50 +0300)]
IB/cma: Add net_dev and private data checks to RDMA CM
Instead of relying on a the ib_cm module to check an incoming CM request's
private data header, add these checks to the RDMA CM module. This allows a
following patch to to clean up the ib_cm interface and remove the code that
looks into the private headers. It will also allow supporting namespaces in
RDMA CM by making these checks namespace aware later on.
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Haggai Eran [Thu, 30 Jul 2015 14:50:21 +0000 (17:50 +0300)]
IB/cm: Expose BTH P_Key in CM and SIDR request events
The rdma_cm module will later use the P_Key from the BTH to de-mux
requests.
See discussion at:
http://www.spinics.net/lists/netdev/msg336067.html
Cc: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Cc: Liran Liss <liranl@mellanox.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Haggai Eran [Thu, 30 Jul 2015 14:50:20 +0000 (17:50 +0300)]
IB/cma: Helper functions to access port space IDRs
Add helper functions to access the IDRs by port-space and port number.
Pass around the port-space enum in cma.c instead of using pointers to
port-space IDRs.
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Yotam Kenneth <yotamke@mellanox.com>
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Guy Shapiro <guysh@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Haggai Eran [Thu, 30 Jul 2015 14:50:19 +0000 (17:50 +0300)]
IB/cma: Refactor RDMA IP CM private-data parsing code
When receiving a connection request, rdma_cm needs to associate the request
with a network device, in order to disambiguate requests. To do this, it
needs to know the request's destination IP. For this the module needs to
allow getting this information from the private data in the request packet,
instead of relying on the information already being in the listening RDMA
CM ID.
When creating a new incoming connection ID, the code in
cma_save_ip{4,6}_info can no longer rely on the listener's private data to
find the port number, so it reads it from the requested service ID.
Signed-off-by: Guy Shapiro <guysh@mellanox.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Yotam Kenneth <yotamke@mellanox.com>
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Haggai Eran [Thu, 30 Jul 2015 14:50:18 +0000 (17:50 +0300)]
IB/cm: Share listening CM IDs
Enabling network namespaces for RDMA CM will allow processes on different
namespaces to listen on the same port. In order to leave namespace support
out of the CM layer, this requires that multiple RDMA CM IDs will be able
to share a single CM ID.
This patch adds infrastructure to retrieve an existing listening ib_cm_id,
based on its device and service ID, or create a new one if one does not
already exist. It also adds a reference count for such instances
(cm_id_private.listen_sharecount), and prevents cm_destroy_id from
destroying a CM if it is still shared. See the relevant discussion [1].
[1] Re: [PATCH v3 for-next 05/13] IB/cm: Reference count ib_cm_ids
http://www.spinics.net/lists/netdev/msg328860.html
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Haggai Eran [Thu, 30 Jul 2015 14:50:17 +0000 (17:50 +0300)]
IB/cm: Expose service ID in request events
Expose the service ID on an incoming CM or SIDR request to the event
handler. This will allow the RDMA CM module to de-multiplex connection
requests based on the information encoded in the service ID.
Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Guy Shapiro [Thu, 30 Jul 2015 14:50:16 +0000 (17:50 +0300)]
IB/ipoib: Return IPoIB devices matching connection parameters
Implement the get_net_device_by_port_pkey_ip callback that returns network
device to ib_core according to connection parameters. Check the ipoib
device and iterate over all child devices to look for a match.
For each IPoIB device we iterate through all upper devices when searching
for a matching IP, in order to support bonding.
Signed-off-by: Guy Shapiro <guysh@mellanox.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Yotam Kenneth <yotamke@mellanox.com>
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Yotam Kenneth [Thu, 30 Jul 2015 14:50:15 +0000 (17:50 +0300)]
IB/core: Find the network device matching connection parameters
In the case of IPoIB, and maybe in other cases, the network device is
managed by an upper-layer protocol (ULP). In order to expose this
network device to other users of the IB device, let ULPs implement
a callback that returns network device according to connection parameters.
The IB device and port, together with the P_Key and the GID should
be enough to uniquely identify the ULP net device. However, in current
kernels there can be multiple IPoIB interfaces created with the same GID.
Furthermore, such configuration may be desireable to support ipvlan-like
configurations for RDMA CM with IPoIB. To resolve the device in these
cases the code will also take the IP address as an additional input.
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Yotam Kenneth <yotamke@mellanox.com>
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Guy Shapiro <guysh@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Haggai Eran [Thu, 30 Jul 2015 14:50:14 +0000 (17:50 +0300)]
IB/core: lock client data with lists_rwsem
An ib_client callback that is called with the lists_rwsem locked only for
read is protected from changes to the IB client lists, but not from
ib_unregister_device() freeing its client data. This is because
ib_unregister_device() will remove the device from the device list with
lists_rwsem locked for write, but perform the rest of the cleanup,
including the call to remove() without that lock.
Mark client data that is undergoing de-registration with a new going_down
flag in the client data context. Lock the client data list with lists_rwsem
for write in addition to using the spinlock, so that functions calling the
callback would be able to lock only lists_rwsem for read and let callbacks
sleep.
Since ib_unregister_client() now marks the client data context, no need for
remove() to search the context again, so pass the client data directly to
remove() callbacks.
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Haggai Eran [Thu, 30 Jul 2015 14:50:13 +0000 (17:50 +0300)]
IB/core: Add rwsem to allow reading device list or client list
Currently the RDMA subsystem's device list and client list are protected by
a single mutex. This prevents adding user-facing APIs that iterate these
lists, since using them may cause a deadlock. The patch attempts to solve
this problem by adding a read-write semaphore to protect the lists. Readers
now don't need the mutex, and are safe just by read-locking the semaphore.
The ib_register_device, ib_register_client, ib_unregister_device, and
ib_unregister_client functions are modified to lock the semaphore for write
during their respective list modification. Also, in order to make sure
client callbacks are called only between add() and remove() calls, the code
is changed to only add items to the lists after the add() calls and remove
from the lists before the remove() calls.
This patch attempts to solve a similar need [1] that was seen in the RoCE
v2 patch series.
[1] http://www.spinics.net/lists/linux-rdma/msg24733.html
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Cc: Matan Barak <matanb@mellanox.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Steve Wise [Mon, 27 Jul 2015 23:10:18 +0000 (18:10 -0500)]
RDMA/Core: remove rdma_cap_read_multi_sge() helper
This functionality already exists via the max_sge_rd
device capability.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Steve Wise [Mon, 27 Jul 2015 23:10:12 +0000 (18:10 -0500)]
svcrdma: Use max_sge_rd for destination read depths
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Steve Wise [Mon, 27 Jul 2015 23:10:07 +0000 (18:10 -0500)]
ipath,qib: Expose max_sge_rd correctly
Applications must not assume that max_sge and max_sge_rd are the same,
Hence expose max_sge_rd correctly as well.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Acked-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Sagi Grimberg [Mon, 27 Jul 2015 23:10:01 +0000 (18:10 -0500)]
mlx4, mlx5, mthca: Expose max_sge_rd correctly
Applications must not assume that max_sge and max_sge_rd are the same,
Hence expose max_sge_rd correctly as well.
Reported-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>