Mike Marciniszyn [Sun, 14 Feb 2016 20:44:34 +0000 (12:44 -0800)]
staging/rdma/hfi1: move txreq header code
The patch separates the txreq defines into new files, one for
verbs and one for sdma.
The verbs_txreq implementation handles the setup and teardown
of the txreq cache, so the register routine is changed to call
the new init/exit routines.
This patch allows for followup patches enhance the send engine.
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Mike Marciniszyn [Sun, 14 Feb 2016 20:44:26 +0000 (12:44 -0800)]
IB/rdmvt: close send engine struct holes
pahole noted the wasted 4 bytes after s_lock and r_lock.
Move s_flags and r_psn to fill the holes.
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Mike Marciniszyn [Sun, 14 Feb 2016 20:44:17 +0000 (12:44 -0800)]
staging/rdma/hfi1: add s_avail to qp_stats
This diagnostic capability was missed in the dual lock series.
Signed-off-by: Vennila Megavannan <vennila.megavannan@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Harish Chegondi [Sun, 14 Feb 2016 20:11:28 +0000 (12:11 -0800)]
IB/qib: Destroy SMI AH before de-allocating the protection domain
If SMI AH is not destroyed before de-allocating the PD, it would result in
non-zero PD use count when de-allocating the PD, triggering a WARN_ON() at
drivers/infiniband/core/verbs.c:284 ib_dealloc_pd+0x69/0xb0 [ib_core]()
when unloading the qib driver on systems with dual-port card.
This problem has always been there in qib and was detected only after the
commit
7dd78647a2c2 ("IB/core: Make ib_dealloc_pd return void") introduced
a WARN_ON in ib_dealloc_pd() that triggers if a PD's use count is non-zero
before de-allocating the PD.
Below is the call trace from the dmesg log.
[ 7264.966129] Call Trace:
[ 7264.969652] [<
ffffffff81338470>] dump_stack+0x44/0x64
[ 7264.976181] [<
ffffffff81086bb6>] warn_slowpath_common+0x86/0xc0
[ 7264.983656] [<
ffffffff81086cfa>] warn_slowpath_null+0x1a/0x20
[ 7264.990961] [<
ffffffffa025c2d9>] ib_dealloc_pd+0x69/0xb0 [ib_core]
[ 7264.998717] [<
ffffffffa0044de8>] ib_mad_port_close+0xb8/0x120 [ib_mad]
[ 7265.006866] [<
ffffffffa0044ebf>] ib_mad_remove_device+0x6f/0xc0 [ib_mad]
[ 7265.015224] [<
ffffffffa025fc87>] ib_unregister_device+0xa7/0x140 [ib_core]
[ 7265.023738] [<
ffffffffa04b5b79>] rvt_unregister_device+0x29/0x80 [rdmavt]
[ 7265.032181] [<
ffffffffa088d2a2>] qib_unregister_ib_device+0x22/0x210 [ib_qib]
[ 7265.040993] [<
ffffffffa085f73f>] qib_remove_one+0x1f/0x250 [ib_qib]
[ 7265.048823] [<
ffffffff8137a319>] pci_device_remove+0x39/0xc0
[ 7265.055984] [<
ffffffff81466a1a>] __device_release_driver+0x9a/0x140
[ 7265.063821] [<
ffffffff81466bc8>] driver_detach+0xb8/0xc0
[ 7265.070579] [<
ffffffff81465a15>] bus_remove_driver+0x55/0xd0
[ 7265.077717] [<
ffffffff8146732c>] driver_unregister+0x2c/0x50
[ 7265.084849] [<
ffffffff813789ba>] pci_unregister_driver+0x2a/0x80
[ 7265.092366] [<
ffffffffa08921bd>] qib_ib_cleanup+0x37/0x65 [ib_qib]
[ 7265.100068] [<
ffffffff811096d0>] SyS_delete_module+0x190/0x220
[ 7265.107379] [<
ffffffff816a7bae>] entry_SYSCALL_64_fastpath+0x12/0x71
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Harish Chegondi <harish.chegondi@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Dennis Dalessandro [Sun, 14 Feb 2016 20:11:20 +0000 (12:11 -0800)]
IB/rdmavt: Remove unnecessary exported functions
Remove exported functions which are no longer required as the
functionality has moved into rdmavt. This also requires re-ordering some
of the functions since their prototype no longer appears in a header
file. Rather than add forward declarations it is just cleaner to
re-order some of the functions.
Reviewed-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Dennis Dalessandro [Sun, 14 Feb 2016 20:11:12 +0000 (12:11 -0800)]
IB/rdmavt: Remove signal_supported and comments
Initially it was intended that rdmavt would support some signaling
between the underlying driver and itself. However this turned out to be
unnecessary for qib and hfi1. If we need to add something like this in
later to support another driver we should do it then. As of now this
essentially dead code so remove it.
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Dennis Dalessandro [Sun, 14 Feb 2016 20:11:03 +0000 (12:11 -0800)]
IB/rdmavt: Remove RVT_FLAGs
While hfi1 and qib were still supporting bits and pieces of core verbs
components there needed to be a way to convey if rdmavt should handle
allocation and initialize of resources like the queue pair table. Now
that all of this is moved into rdmavt there is no need for these flags.
They are no longer used in the drivers.
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Dennis Dalessandro [Sun, 14 Feb 2016 20:10:55 +0000 (12:10 -0800)]
IB/qib,rdmavt: Move smi_ah to qib
Rdmavt adopted an smi_ah from qib which is not needed by hfi1. Move this
back to qib and get it out of the common library.
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Dennis Dalessandro [Sun, 14 Feb 2016 20:10:45 +0000 (12:10 -0800)]
IB/qib: Setup notify free/create mad agent callbacks for rdmavt
Qib needs to be notified when mad agents are created and freed, there is
some counter maintenance that needs to be performed. Add those callbacks at
registration time with rdmavt.
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Dennis Dalessandro [Sun, 14 Feb 2016 20:10:37 +0000 (12:10 -0800)]
IB/rdmavt: Add per verb driver callback checking
For each verb validate that all requirements for driver callbacks are met.
If a function is called without checking for a valid pointer, it is a
required function. Also document what each callback function does.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Dennis Dalessandro [Sun, 14 Feb 2016 20:10:29 +0000 (12:10 -0800)]
IB/rdmavt: Clean up comments and add more documentation
Add, remove, and otherwise clean up existing comments that are leftover
from the initial code postings of rdmavt. Many of the comments were added
to provide an idea on the direction we were thinking of going. Now that the
design is solidified make a pass over and clean everything up. Also add
details where lacking.
Ensure all non static functions have nano comments.
Reviewed-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Kaike Wan [Sun, 14 Feb 2016 20:10:20 +0000 (12:10 -0800)]
staging/rdma/hfi1: Put QPs into error state after SL->SC table changes
If an SL->SC mapping table change occurs after an RC/UC QP is created,
there is no mechanism to change the SC nor the VL for that QP. The fix
is to place the QP into error state so that ULP can recreate the QP with
the new SL->SC mapping.
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Harish Chegondi [Sun, 14 Feb 2016 20:10:12 +0000 (12:10 -0800)]
IB/rdmavt: Add trace and error print statements in post_one_wr
These trace and error print statements would help in debugging issues which
are caused due to messed up QP ring buffer pointers.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Harish Chegondi <harish.chegondi@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Mike Marciniszyn [Sun, 14 Feb 2016 20:10:04 +0000 (12:10 -0800)]
IB/qib, staging/rdma/hfi1: add s_hlock for use in post send
This patch adds an additional lock to reduce contention on the s_lock.
This lock is used in post_send() so that the post_send is not
serialized with the send engine and other send related processing.
To do this the s_next_psn is now maintained on post_send() while
post_send() related fields are moved to a new cache line. There is
an s_avail maintained for the post_send() to mitigate trading cache
lines with the send engine. The lock is released/acquired around
releasing the just built packet to the egress mechanism.
Reviewed-by: Jubin John <jubin.john@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Harish Chegondi <harish.chegondi@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Harish Chegondi [Sun, 14 Feb 2016 20:09:55 +0000 (12:09 -0800)]
IB/qib: Rename several functions by adding a "qib_" prefix
This would avoid conflict with the functions in hfi1 that have similar
names when both qib and hfi1 drivers are configured to be built into
the kernel. This issue came up in the 0-day build report.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Harish Chegondi <harish.chegondi@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Vennila Megavannan [Tue, 9 Feb 2016 22:29:49 +0000 (14:29 -0800)]
IB/rdmavt, staging/rdma/hfi1: use qps to dynamically scale timeout value
A busy_jiffies variable is maintained and updated when rc qps are
created and deleted. busy_jiffies is a scaled value of the number
of rc qps in the device. busy_jiffies is incremented every rc qp
scaling interval. busy_jiffies is added to the rc timeout
in add_retry_timer and mod_retry_timer. The rc qp scaling interval
is selected based on extensive performance evaluation of targeted
workloads.
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Vennila Megavannan <vennila.megavannan@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Sebastian Sanchez [Tue, 9 Feb 2016 22:29:40 +0000 (14:29 -0800)]
staging/rdma/hfi1: Turning off LED without checking if stepping is Ax
It prevents the LED from staying on when the QSFP module is
not present.
Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Mike Marciniszyn [Tue, 9 Feb 2016 22:29:31 +0000 (14:29 -0800)]
staging/rdma/hfi1: actually use new RNR timer API in loopback path
The patch series which added a new API for the RNR timer did not include an
updated call in the loopback path. RC/UC RNR loopback would be broken
without this.
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Easwar Hariharan [Tue, 9 Feb 2016 22:29:22 +0000 (14:29 -0800)]
staging/rdma/hfi1: Tune for unknown channel if configuration file is absent
Currently, the driver fails to tune the SerDes and therefore prevents
link up if the configuration file is missing or fails parsing or
validation. This patch adds a fallback option so that the 8051 is asked
to tune for an unknown channel and possibly get the link up if tuning
succeeds. It also adds a user-friendly message to update the
configuration file if it is out-of-date.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Easwar Hariharan <easwar.hariharan@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Easwar Hariharan [Tue, 9 Feb 2016 22:29:13 +0000 (14:29 -0800)]
staging/rdma/hfi1: Fetch platform configuration data from EFI variable
The platform configuration data has been moved into the EFI variable
store where it is populated by the HFI1 option ROM. This patch pulls
the configuration data from the new location, retaining a fallback to
request_firmware.
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Easwar Hariharan <easwar.hariharan@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Hari Prasath Gujulan Elango [Thu, 4 Feb 2016 19:03:45 +0000 (11:03 -0800)]
IB/qib,staging/rdma/hfi1: use setup_timer api
Replace the timer API's to initialize a timer & then assign the callback
function by the setup_timer() API.
Signed-off-by: Hari Prasath Gujulan Elango <hgujulan@visteon.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Mike Marciniszyn [Thu, 4 Feb 2016 19:03:36 +0000 (11:03 -0800)]
IB/rdmavt: remove unused qp field
The field is a vestige from ipath.
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Mike Marciniszyn [Thu, 4 Feb 2016 19:03:28 +0000 (11:03 -0800)]
IB/qib: Insure last cursor is updated prior to complete
This patch is a prerequisite for adding a separate lock
for post send.
The timing of updating s_last needs to be before returning
any send completion to avoid a race between a poll cq seeing
a completion and the post send checking for a full queue.
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Mike Marciniszyn [Thu, 4 Feb 2016 19:03:19 +0000 (11:03 -0800)]
staging/rdma/hfi1: Insure last cursor is updated prior to complete
This patch is a prerequisite for adding a separate lock
for post send.
The timing of updating s_last needs to be before returning
any send completion to avoid a race between a poll cq seeing
a completion and the post send checking for a full queue.
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Mike Marciniszyn [Thu, 4 Feb 2016 19:03:11 +0000 (11:03 -0800)]
staging/rdma/hfi1: add s_retry to diagnostics
This is needed to debug ULP issues with getting retry attributes
correctly specified.
Reviewed-by: Jubin John <jubin.john@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Mike Marciniszyn [Thu, 4 Feb 2016 19:03:02 +0000 (11:03 -0800)]
staging/rdma/hfi1: remove duplicate timeout print
The qp->timeout field is duplicated in the
seqfile print.
Remove it.
Reviewed-by: Jubin John <jubin.john@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Mike Marciniszyn [Thu, 4 Feb 2016 18:59:36 +0000 (10:59 -0800)]
staging/rdma/hfi1: use new RNR timer
Use the new RNR timer for hfi1.
For qib, this timer doesn't exist, so exploit driver
callbacks to use the new timer as appropriate.
Reviewed-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Mike Marciniszyn [Thu, 4 Feb 2016 18:59:27 +0000 (10:59 -0800)]
staging/rdma/hfi1: add unique rnr timer
Add a new rnr timer to hfi1.
This allows for future optimizations having the
retry and rnr timers separate.
Reviewed-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Mike Marciniszyn [Thu, 4 Feb 2016 18:59:18 +0000 (10:59 -0800)]
staging/rdma/hfi1: use mod_timer when appropriate
Use new timer API to optimize maintenance of
timers during ACK processing.
When we are still expecting ACKs, mod the timer
to avoid a heavyweight delete/add. Otherwise, insure
do_rc_ack() maintains the timer as it had.
Reviewed-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Mike Marciniszyn [Thu, 4 Feb 2016 18:59:09 +0000 (10:59 -0800)]
staging/rdma/hfi1: use new timer routines
Use the new timer routines.
Reviewed-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Mike Marciniszyn [Thu, 4 Feb 2016 18:59:01 +0000 (10:59 -0800)]
staging/rdma/hfi1: centralize timer routines into rc
Centralize disparate timer maintenance.
This allow for central control and changes to the RC
timer handling including future optimizations.
Reviewed-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Sebastian Sanchez [Wed, 3 Feb 2016 22:38:16 +0000 (14:38 -0800)]
staging/rdma/hfi1: Removing unused struct hfi1_verbs_counters
It removes the unused struct hfi1_verbs_counters from verbs.h
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Sebastian Sanchez [Wed, 3 Feb 2016 22:38:07 +0000 (14:38 -0800)]
staging/rdma/hfi1: Adding support for hfi counters via sysfs
It enables access to counters in
/sys/class/infiniband/hfi1_0/ports/1/counters
by providing infrastructure when PMA queries occur. Counters symbol_error
and VL15_dropped are not supported in OPA, therefore, 0 will always be
returned. In addition, two common routines (pma_get_opa_port_dctrs,
pma_get_opa_port_ectrs) were created to query counters to avoid code
duplication.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Sebastian Sanchez [Wed, 3 Feb 2016 22:37:59 +0000 (14:37 -0800)]
staging/rdma/hfi1: Replacement of goto's for break/returns
It replaces goto's for break and return statements in process_perf_opa().
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Sebastian Sanchez [Wed, 3 Feb 2016 22:37:50 +0000 (14:37 -0800)]
staging/rdma/hfi1: Change for data type of port number
This commit changes the data type for port_num in
pma_get_opa_porterrors() from unsigned long to u8.
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Mitko Haralanov [Wed, 3 Feb 2016 22:37:41 +0000 (14:37 -0800)]
staging/rdma/hfi1: Fix bug that could block the process on context exit
A race was discovred in the user SDMA code, which could result
in an process being stuck in the kernel call indefinitely in
certain error conditions.
If, during the processing of a user SDMA request, there was an
error *and* all outstanding SDMA descriptor had been completed
by the time the that error case was handled in the calling function,
the state of the packet queue would not get correctly updated
resulting in the process subsequently getting stuck, thinking that
there are more descriptors to be completed.
To handle this scenario, the driver now checks the submitted
packet count vs. the completed. If all submitted packets have also
been completed, the driver can safely free the request and signal
user level. Otherwise, this will be handled by the completion
callback.
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Dean Luick [Wed, 3 Feb 2016 22:37:32 +0000 (14:37 -0800)]
staging/rdma/hfi1: Remove unused variable nsbr
Remove unused nsbr count from PCIe Gen3 code
Reviewed-by: Stuart Summers <john.s.summers@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Dean Luick [Wed, 3 Feb 2016 22:37:24 +0000 (14:37 -0800)]
staging/rdma/hfi1: Make EPROM check per device
Add a variable eprom_available to each device, replacing the
global of the same name. This is to allow multiple HFI devices
with different EPROM availability to operate correctly on the
the same system.
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Sadanand Warrier [Wed, 3 Feb 2016 22:37:15 +0000 (14:37 -0800)]
staging/rdma/hfi1: Add credits for VL0 to VL7 in snoop mode
Add a new option to the snoop ioctl which allows credits to be allocated
across all VLs. Previously only VL0 and VL15 had credits allocated.
The new option used in the ioctl HFI1_SNOOP_IOCSET_OPTS allows credits
to be allocated so that VL15 will have at least 8.5KB credits and the
other VLs will have the rest of the credits divided equally across
themselves.
The total number of credits are stored in the upper 16 bits of the
integer passed and the cumulative value should ensure that VL0 has at
least 8.5KB and each VL a minimum of 2KB + 128 bytes
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Sadanand Warrier <sadanand.warrier@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Mitko Haralanov [Wed, 3 Feb 2016 22:37:06 +0000 (14:37 -0800)]
staging/rdma/hfi1: Improve performance of user SDMA
To facilitate locked page counting, the user SDMA
routines would maintain a list of io vectors, which
were freed in the completion callback and then unpin
the associated pages during the next call into the
kernel.
Since the size of this list was unbounded, doing this
was bad for performance because the driver ended up
spending too much time freeing the io vectors.
This commit changes how the io vector freeing is done
by moving the actual page unpinning in the callback and
maintaining a count of unpinned pages. This count can
then be used during the next call into the kernel to
update the mm->pinned_vm variable (since that requires
process context and the ability to sleep.)
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Easwar Hariharan [Wed, 3 Feb 2016 22:36:58 +0000 (14:36 -0800)]
staging/rdma/hfi1, IB/core: Fix LinkDownReason define for consistency
LinkDownReason LocalMediaNotInstalled lacked an underscore
and was inconsistent with other defines in the same family.
This patch fixes this.
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Easwar Hariharan <easwar.hariharan@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Harish Chegondi [Wed, 3 Feb 2016 22:36:49 +0000 (14:36 -0800)]
staging/rdma/hfi1: Remove modify_port and port_immutable functions
Delete code from query_port which has been moved into rvt_query_port
Create a call back function to shut down a port which may be called from
rvt_modify_port
Signed-off-by: Harish Chegondi <harish.chegondi@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Dennis Dalessandro [Wed, 3 Feb 2016 22:36:40 +0000 (14:36 -0800)]
staging/rdma/hfi1: Support query gid in rdmavt
Query gid is in rdmavt, but still relies on the driver to maintain the
guid table. Add the necessary driver call back and remove the existing
verb handler.
Reviewed-by: Harish Chegondi <harish.chegondi@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Jubin John [Wed, 3 Feb 2016 22:36:31 +0000 (14:36 -0800)]
staging/rdma/hfi1: Clean up init_cntrs()
Clean up init_cntrs() by removing unnecessary memsets and debug
statements
Suggested-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Dean Luick [Wed, 3 Feb 2016 22:36:22 +0000 (14:36 -0800)]
staging/rdma/hfi1: Fix snoop packet length calculation
The LRH has a 12 bit packet length field, not 11 bit. This caused a
snoop packet length miscalculation leading to a crash when sending a
large ping over IPoIB while running opapacketcapture.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Dean Luick [Wed, 3 Feb 2016 22:36:14 +0000 (14:36 -0800)]
staging/rdma/hfi1: Correct TWSI reset
Change the TWSI reset function so it will stop the reset
once the lines are in an expected state.
Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Pablo Cacho <pablo.cacho@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Dean Luick [Wed, 3 Feb 2016 22:36:06 +0000 (14:36 -0800)]
staging/rdma/hfi1: Remove PCIe AER diagnostic message
There are several reasons why PCIE AER cannot be enabled. Do not
report the failure to enable as an error.
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Easwar Hariharan [Wed, 3 Feb 2016 22:35:57 +0000 (14:35 -0800)]
staging/rdma/hfi1: Implement LED beaconing for maintenance
This patch implements LED beaconing for maintenance. A MAD packet with
the LEDInfo attribute set to 1 will enable LED beaconing with a duty
cycle of 2s on and 1.5s off. A MAD packet with the LEDInfo attribute
set to 0 will disable beaconing and return the LED to normal operation.
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Easwar Hariharan <easwar.hariharan@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Dean Luick [Wed, 3 Feb 2016 22:35:49 +0000 (14:35 -0800)]
staging/rdma/hfi1: Split last 8 bytes of copy to user buffer
Copy the last 8 bytes of user mode RC WRITE_ONLY and WRITE_LAST
opcodes separately from the rest of the data.
It is a de-facto standard for some MPI implementations to use a
poll on the last few bytes of a verbs message to indicate that
the message has been received rather than follow the required
function method. The driver uses the kernel memcpy routine, which
becomes "rep movsb" on modern machines. This copy, while very
fast, does not guarantee in-order copy completion and the result
is an occasional perceived corrupted packet. Avoid the issue by
splitting the last 8 bytes to copy from the verbs opcodes where it
matters and performing an in-order byte copy.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Dean Luick [Wed, 3 Feb 2016 22:35:40 +0000 (14:35 -0800)]
staging/rdma/hfi1: Fix fabric serdes reset by re-downloading firmware
A host fabric serdes reset is required to go back to polling.
However, access to the fabric serdes may have been invalidated
by the sibling HFI when it downloads its fabric serdes firmware.
Work around this by re-downloading and re-validating the serdes
firmware at reset time on Bx hardware.
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Dean Luick [Wed, 3 Feb 2016 22:35:31 +0000 (14:35 -0800)]
staging/rdma/hfi1: Report physical state changes per device instead of globally
Make physical state change reporting be per-device, not global
to reduce excessive reports of "physical state changed"
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Mitko Haralanov [Wed, 3 Feb 2016 22:35:23 +0000 (14:35 -0800)]
staging/rdma/hfi1: Properly determine error status of SDMA slots
To ensure correct operation between the driver and PSM
with respect to managing the SDMA request ring, it is
important that the status for a particular request slot
is set at the correct time. Otherwise, PSM can get out
of sync with the driver, which could lead to hangs or
errors on new requests.
Properly determining of when to set the error status of
a SDMA slot depends on knowing exactly when the last txreq
for that request has been completed. This in turn requires
that the driver knows exactly how many requests have been
generated and how many of those requests have been successfully
submitted to the SDMA queue.
The previous implementation of the mid-layer SDMA API did not
provide a way for the caller of sdma_send_txlist() to know how
many of the txreqs in the input list have actually been submitted
without traversing the list and counting. Since sdma_send_txlist()
already traverses the list in order to process it, requiring
such traversal in the caller is completely unnecessary. Therefore,
it is much easier to enhance sdma_send_txlist() to return the
number of successfully submitted txreqs.
This, in turn, allows the caller to accurately determine the
progress of the SDMA request and, therefore, correctly set the
error status at the right time.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Dean Luick [Wed, 3 Feb 2016 22:35:14 +0000 (14:35 -0800)]
staging/rdma/hfi1: correctly check for post-interrupt packets
At the end of the packet processing interrupt and thread handler,
the RcvAvail interrupt is finally cleared down. There is a window
between the last packet check (via DMA to memory) and interrupt
clear-down. The code to recheck for a packet once the RcvAVail
interrupt is enabled must ultimately use a CSR read of RcvHdrTail
rather than depend on DMA'ed memory.
This change adds a CSR read of RcvHdrTail if the memory check does
not show a packet preset. The memory check is retained as a quick
test before doing the more expensive, but always correct, CSR read.
In the ASIC, the CSR read used to force the RcvAvail clear-down write
to complete may bypass queued DMA writes to memory. The only correct
way to decide if a packet has arrived without an interrupt to push DMA
to memory ahead of itself is to read the tail directly after RcvAvail
has been cleared down. It is not sufficient to just read the tail and
skip pushing the clear-down. Both must be done. The tail read will not
push clear-down write due to it being in a different area of the chip.
At this point, it is OK to have packet data still being DMA'ed to
memory. This is the end of packet processing for previous packets.
If the driver detects a new packet has arrived before interrputs were
re-enabled, it will force a new interrupt and the interrupt will push
the packet DMAs to memory, where the driver will then react to the
interrupt and do normal packet processing.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Mitko Haralanov [Wed, 3 Feb 2016 22:35:06 +0000 (14:35 -0800)]
staging/rdma/hfi1: Improve performance of SDMA transfers
Commit
a0d406934a46 ("staging/rdma/hfi1: Add page lock limit
check for SDMA requests") added a mechanism to
delay the clean-up of user SDMA requests in order to facilitate
proper locked page counting.
This delayed processing was done using a kernel workqueue, which
meant that a kernel thread would have to spin up and take CPU
cycles to do the clean-up.
This proved detrimental to performance because now there are two
execution threads (the kernel workqueue and the user process)
needing cycles on the same CPU.
Performance-wise, it is much better to do as much of the clean-up
as can be done in interrupt context (during the callback) and do
the remaining work in-line during subsequent calls of the user
process into the driver.
The changes required to implement the above also significantly
simplify the entire SDMA completion processing code and eliminate
a memory corruption causing the following observed crash:
[ 2881.703362] BUG: unable to handle kernel NULL pointer dereference at (null)
[ 2881.703389] IP: [<
ffffffffa02897e4>] user_sdma_send_pkts+0xcd4/0x18e0 [hfi1]
[ 2881.703422] PGD
7d4d25067 PUD
77d96d067 PMD 0
[ 2881.703427] Oops: 0000 [#1] SMP
[ 2881.703431] Modules linked in:
[ 2881.703504] CPU: 28 PID: 6668 Comm: mpi_stress Tainted: G OENX 3.12.28-4-default #1
[ 2881.703508] Hardware name: Intel Corporation S2600KP/S2600KP, BIOS SE5C610.86B.11.01.0044.090
[ 2881.703512] task:
ffff88077da8e0c0 ti:
ffff880856772000 task.ti:
ffff880856772000
[ 2881.703515] RIP: 0010:[<
ffffffffa02897e4>] [<
ffffffffa02897e4>] user_sdma_send_pkts+0xcd4/0x
[ 2881.703529] RSP: 0018:
ffff880856773c48 EFLAGS:
00010287
[ 2881.703531] RAX:
0000000000000000 RBX:
0000000000001000 RCX:
0000000000002000
[ 2881.703534] RDX:
0000000000000000 RSI:
0000000000000000 RDI:
0000000000002000
[ 2881.703537] RBP:
0000000000000000 R08:
0000000000000001 R09:
0000000000000000
[ 2881.703540] R10:
0000000000000000 R11:
0000000000000000 R12:
0000000000000000
[ 2881.703543] R13:
0000000000000000 R14:
ffff88071e782e68 R15:
ffff8810532955c0
[ 2881.703546] FS:
00007f9c4375e700(0000) GS:
ffff88107eec0000(0000) knlGS:
0000000000000000
[ 2881.703549] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 2881.703551] CR2:
0000000000000000 CR3:
00000007d4cba000 CR4:
00000000003407e0
[ 2881.703554] DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
[ 2881.703556] DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
[ 2881.703558] Stack:
[ 2881.703559]
ffffffff00002000 ffff881000001800 ffffffff00000000 00000000000080d0
[ 2881.703570]
0000000000000000 0000200000000000 0000000000000000 ffff88071e782db8
[ 2881.703580]
ffff8807d4d08d80 ffff881053295600 0000000000000008 ffff88071e782fc8
[ 2881.703589] Call Trace:
[ 2881.703691] [<
ffffffffa028b5da>] hfi1_user_sdma_process_request+0x84a/0xab0 [hfi1]
[ 2881.703777] [<
ffffffffa0255412>] hfi1_aio_write+0xd2/0x110 [hfi1]
[ 2881.703828] [<
ffffffff8119e3d8>] do_sync_readv_writev+0x48/0x80
[ 2881.703837] [<
ffffffff8119f78b>] do_readv_writev+0xbb/0x230
[ 2881.703843] [<
ffffffff8119fab8>] SyS_writev+0x48/0xc0
This commit also addresses issues related to notification of user
processes of SDMA request slot availability. The slot should be
cleaned up first before the user processes is notified of its
availability.
Reviewed-by: Arthur Kepner <arthur.kepner@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Dean Luick [Wed, 3 Feb 2016 22:34:58 +0000 (14:34 -0800)]
staging/rdma/hfi1: Use device file minor to identify EPROM
When writing to the EPROM, the driver will always use the
"first" device. This is incorrect for multiple cards.
Use the device file minor to determine the device to use.
Reject the generic device file.
Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Easwar Hariharan [Wed, 3 Feb 2016 22:34:49 +0000 (14:34 -0800)]
staging/rdma/hfi1: Reduce syslog message severity and provide speed information
The syslog message causes unnecessary alarm for the single and dual port
x8 cards by reporting at an error level. This patch reduces the severity
to informational only and adds speed information.
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Easwar Hariharan <easwar.hariharan@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Mitko Haralanov [Wed, 3 Feb 2016 22:34:41 +0000 (14:34 -0800)]
staging/rdma/hfi1: Improve performance of TID cache look up
When TID caching was enabled, the way the driver found
RB nodes when PSM was unprogramming TID entries was by
traversing the RB tree, looking for a match on the
RcvArray entry index.
The performance of this algorithm was not only poor but
also inconsistent depending on how many RB nodes would
have to be traversed before a match was found.
The lower performance was especially evident in cases where
there was a cache miss with the cache full, requiring the
unprogramming of several TID entries.
This commit changes how RB nodes are looked up when being
free'd by PSM to a index-based lookup into a flat array on
the index of the RcvArray entry. This turns the entire
look-up process into an O(1) algorithm.
Special care needs to be taken for situations when TID
caching is disabled. In those cases, there is no need to
insert the RB nodes into an actual RB tree. Since the entire
RcvArray management mechanism is managed by an index-based
algorithm, the RB nodes can be saved into the flat array,
making both "insertion" and "removal" faster.
Reviewed-by: Arthur Kepner <arthur.kepner@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Sebastian Sanchez [Wed, 3 Feb 2016 22:34:32 +0000 (14:34 -0800)]
staging/rdma/hfi1: Fix for module parameter rcvhdrcnt when it's
2097152
The driver crashes when loaded with parameter rcvhdrcnt=
2097152.
The root cause was that rcvhdrcnt was initially a 32 bit variable
and its value was assigned to a 16 bit variable, truncating the
upper 16 bits. This patch prevents the user from passing a value
for rcvhdrcnt greater than 16352 (Maximum number for rcvhdrcnt).
Reviewed-by: Dean Luick <dean.luick@intel.com>
Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Vennila Megavannan [Wed, 3 Feb 2016 22:34:23 +0000 (14:34 -0800)]
staging/rdma/hfi1: Allow a fair scheduling of QPs
This patch fixes the fairness issues in QP scheduling
- the timeout for cond_resched is changed to a ratio of
qp->timeout_jiffies
- workqueue_congested is used to determine if qp needs to
reschedule itself
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Vennila Megavannan <vennila.megavannan@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Dean Luick [Wed, 3 Feb 2016 22:34:15 +0000 (14:34 -0800)]
staging/rdma/hfi1: Fix for generic I2C interface
The original I2C interface was geared for QSFP accesses. Modify
the interface to behave more like a generic I2C controller such
that reads and writes can accept multi-byte offsets. Removed
reads following writes and moved reset to top level.
Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Pablo Cacho <pablo.cacho@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Vennila Megavannan [Wed, 3 Feb 2016 22:34:07 +0000 (14:34 -0800)]
staging/rdma/hfi1: Change send_schedule counter to a per cpu counter
A patch to fix fairness issues in QP scheduling requires
n_send_schedule counter to be converted to a per cpu counter to reduce
cache misses.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Vennila Megavannan <vennila.megavannan@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Mitko Haralanov [Wed, 3 Feb 2016 22:33:58 +0000 (14:33 -0800)]
staging/rdma/hfi1: Verbs Mem affinity support
Change verbs memory allocations to the device numa node. This keeps memory
close to the device for optimal performance.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Mitko Haralanov [Wed, 3 Feb 2016 22:33:49 +0000 (14:33 -0800)]
staging/rdma/hfi1: Allocate send ctxt on device NUMA node
Allocate the user mode send context memory on the numa node which the
device is attached to for better performance.
Reviewed-by: Dean Luick <dean.luick@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Mitko Haralanov [Wed, 3 Feb 2016 22:33:40 +0000 (14:33 -0800)]
staging/rdma/hfi1: Consolidate CPU/IRQ affinity support
This patch unifies the affinity support for CPU and IRQ allocations into
a single code base. The goal is to allow the driver to make intelligent
placement decision based on an overall view of processes and IRQs across
as much of the driver as possible.
Pulling all the scattered affinity code into a single code base lays the
ground work for accomplishing the above goal. For example, previous
implementations made user process placement decision solely based on
other user processes. This algorithm is limited as it did not take into
account IRQ placement and could result in overloading certain CPUs.
A single code base also provides a much easier way to maintain and debug
any performance issues related to affinity.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Mitko Haralanov [Wed, 3 Feb 2016 22:33:31 +0000 (14:33 -0800)]
staging/rdma/hfi1: Remove unnecessary duplicated variable
struct hfi1_devdata contained 2 variables which represented the numa
node the device is attached to. Remove the duplicated one.
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Mitko Haralanov [Wed, 3 Feb 2016 22:33:22 +0000 (14:33 -0800)]
staging/rdma/hfi1: Remove unused code
This comment and code was unused. Just remove it.
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Ira Weiny [Wed, 3 Feb 2016 22:33:14 +0000 (14:33 -0800)]
staging/rdma/hfi1: Fix SL->SC checks
SLs which are mapped to SC15 are invalid and should fail the
operation.
For RC/UC QP types, verify the AH information at modify_qp time and
fail the modify_qp if the SL is invalid.
For other QP types check the SL during post_send via the new rdmavt
callback.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Ashutosh Dixit [Wed, 3 Feb 2016 22:33:06 +0000 (14:33 -0800)]
staging/rdma/hfi1: Add support for enabling/disabling PCIe ASPM
hfi1 HW has a high PCIe ASPM L1 exit latency and also advertises an
acceptable latency less than actual ASPM latencies. Additional
mechanisms than those provided by BIOS/OS are therefore required to
enable/disable ASPM for hfi1 to provide acceptable power/performance
trade offs. This patch adds this support.
By means of a module parameter ASPM can be either (a) always enabled
(power save mode) (b) always disabled (performance mode) (c)
enabled/disabled dynamically. The dynamic mode implements two
heuristics to alleviate possible problems with high ASPM L1 exit
latency. ASPM is normally enabled but is disabled if (a) there are any
active user space PSM contexts, or (b) for verbs, ASPM is disabled as
interrupt activity for a context starts to increase.
A few more points about the verbs implementation. In order to reduce
lock/cache contention between multiple verbs contexts, some processing
is done at the context layer before contending for device layer
locks. ASPM is disabled when two interrupts for a context happen
within 1 millisec. A timer is scheduled which will re-enable ASPM
after 1 second should the interrupt activity cease. Normally, every
interrupt, or interrupt-pair should push the timer out
further. However, since this might increase the processing load per
interrupt, pushing the timer out is postponed for half a second. If
after half a second we get two interrupts within 1 millisec the timer
is pushed out by another second.
Finally, the kernel ASPM API is not used in this patch. This is
because this patch does several non-standard things as SW workarounds
for HW issues. As mentioned above, it enables ASPM even when advertised
actual latencies are greater than acceptable latencies. Also, whereas
the kernel API only allows drivers to disable ASPM from driver probe,
this patch enables/disables ASPM directly from interrupt context. Due
to these reasons the kernel ASPM API was not used.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Vennila Megavannan [Wed, 3 Feb 2016 22:32:57 +0000 (14:32 -0800)]
staging/rdma/hfi1: Method to toggle "fast ECN" detection
Add a per port sysfs paramter to toggle cc_prescan/Fast ECN Detection and
remove the Kconfig option which was previously used to control this.
While am updating the sysfs documentation, fix the name of CCMgtA.
Reviewed-by: Arthur Kepner <arthur.kepner@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Vennila Megavannan <vennila.megavannan@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Mitko Haralanov [Wed, 3 Feb 2016 22:32:49 +0000 (14:32 -0800)]
staging/rdma/hfi1: Correctly set RcvCtxtCtrl register
The RcvCtxtCtrl register was being incorrectly set upon context
initialization and clean up resulting, in many cases, of contexts using
settings from previous contexts' initialization. This resulted in bad
and unexpected behavior. This was especially important for the TailUpd
bit, which requires special handling and if set incorrectly could lead
to severely degraded performance.
This patch fixes the handling of the RcvCtxtCtrl register, ensuring that
each context gets initialized with settings applicable only for that
context. It also ensures the proper setting for the TailUpd bit by
setting it to either 0 or 1 (as needed by the context's configuration)
explicitly.
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Sebastian Sanchez [Wed, 3 Feb 2016 22:32:40 +0000 (14:32 -0800)]
staging/rdma/hfi1: Fix for 32-bit counter overflow in driver and hfi1stats
When 32-bit hardware counters overflow, hfi1stats misinterprets
the counters as being 64 bits causing the deltas for the
counters to be a huge number. This patch makes hfi1stats
aware that a counter is 32 bits by making the driver write
<counter name>,32 to debugfs.
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Dean Luick [Wed, 3 Feb 2016 22:32:31 +0000 (14:32 -0800)]
staging/rdma/hfi1: Skip lcb init for simulation
The simulator does not correctly handle LCB cclk loopback.
Skip that step for simulation - it is not needed.
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Dean Luick [Wed, 3 Feb 2016 22:32:23 +0000 (14:32 -0800)]
staging/rdma/hfi1: No firmware retry for simulation
Simulation has no firmware, so it will never move firmware
acquire to the FINAL state. Avoid that by skiping the TRY
state and moving directly to FINAL.
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Easwar Hariharan [Wed, 3 Feb 2016 22:32:14 +0000 (14:32 -0800)]
staging/rdma/hfi1: Don't attempt to qualify or tune loopback plugs
Loopback plugs used for testing hardware don't need to be qualified to
bring the link up unlike production cables. This patch adds an exception
for loopback plugs to the QSFP and SerDes tuning algortihm.
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Easwar Hariharan <easwar.hariharan@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Dean Luick [Wed, 3 Feb 2016 22:32:06 +0000 (14:32 -0800)]
staging/rdma/hfi1: Make firmware failure messages warnings
Make firmware validation failure and missing firmware messages
a warning since alternates can be tried. Add an error message
when all attempts fail.
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Dean Luick [Wed, 3 Feb 2016 22:31:57 +0000 (14:31 -0800)]
staging/rdma/hfi1: Only warn when board description is not found
Change-Id: Icc4ad27c4c67e51df8c8a203c4f16973793678ec
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Mike Marciniszyn [Wed, 3 Feb 2016 22:31:49 +0000 (14:31 -0800)]
staging/rdma/hfi1: Fix per-VL transmit discard counts
Implement per-VL transmit counters. Not all errors can be
attributed to a particular VL, so make a best attempt.
o Extend the egress error bits used to count toward transmit
discard.
o When an egress error or send error occur, try to map back
to a VL.
o Implement a SDMA engine to VL (back) map.
o Add per-VL port transmit counters
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Dean Luick [Wed, 3 Feb 2016 22:31:40 +0000 (14:31 -0800)]
staging/rdma/hfi1: Fix missing firmware NULL dereference
The gen3 bump code must mark a firmware download failure as fatal.
Otherwise a later load attempt will fail with a NULL dereference.
Also:
o Only do a firmware back-off for RTL. There are no alternates for
FPGA or simulation.
o Rearrange OS firmware request order to match what is actually
loaded. This results in more coherent informational messages
in the case of missing firmware.
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Easwar Hariharan [Wed, 3 Feb 2016 22:31:31 +0000 (14:31 -0800)]
staging/rdma/hfi1: Support external device configuration requests from 8051
This patch implements support for turning on and off the clock data
recovery mechanisms implemented in QSFP cable on request by the DC 8051
on a per-lane basis.
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Easwar Hariharan <easwar.hariharan@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Easwar Hariharan [Wed, 3 Feb 2016 22:31:22 +0000 (14:31 -0800)]
staging/rdma/hfi1: Get port type from configuration file
The current code employs a heuristic to guess the port type.
The canonical location to identify the port type of the
designed platform is from the platform configuration data.
This patch uses the previously fetched port type from the platform
configuration and removes the now obsolete heuristic routine
and its associated defines.
Reviewed-by: Arthur Kepner <arthur.kepner@intel.com>
Signed-off-by: Easwar Hariharan <easwar.hariharan@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Easwar Hariharan [Wed, 3 Feb 2016 22:31:14 +0000 (14:31 -0800)]
staging/rdma/hfi1: Add active and optical cable support
This patch qualifies and tunes active and optical cables for optimal
bit error rate and signal integrity settings. These settings are
fetched from the platform configuration data.
Based on attributes of the QSFP cable as read from the SFF-8636
compliant memory map, we select the appropriate settings from the
platform configuration data (examples: TX/RX equalization, enabling
cable high power, enabling TX/RX clock data recovery mechanisms, and RX
amplitude control) and apply them to the SERDES and QSFP cable.
The platform configuration data also contains system parameters such
as maximum power dissipation supported, and the cables are qualified
based on these parameters. As part of qualifying the cables, the
correct OfflineDisabledReasons are set for the appropriate scenarios.
Reviewed-by: Dean Luick <dean.luick@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Brent R Rothermel <brent.r.rothermel@intel.com>
Signed-off-by: Easwar Hariharan <easwar.hariharan@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Mike Marciniszyn [Wed, 3 Feb 2016 22:31:05 +0000 (14:31 -0800)]
staging/rdma/hfi1: Fix QSFP memory read/write across 128 byte boundary
The QSFP memory cache reads both lower and upper page 0H in one shot,
which leads to the address counter wrapping around to the beginning of
lower page 00H at byte 128, as defined by SFF-8636.
This patch fixes this by modifying the underlying QSFP read and writes
to avoid this wrap around.
Reviewed-by: Dean Luick <dean.luick@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Easwar Hariharan <easwar.hariharan@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Easwar Hariharan [Wed, 3 Feb 2016 22:30:57 +0000 (14:30 -0800)]
staging/rdma/hfi1: cleanup messages on qsfp_read() failure
The ":" in "%s:" adds no value.
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Easwar Hariharan <easwar.hariharan@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Bryan Morgan [Wed, 3 Feb 2016 22:30:49 +0000 (14:30 -0800)]
staging/rdma/hfi1: HFI reports wrong offline disabled reason when cable removed
Removing QSFP cable should report 'No Local Media' instead of
'Transient' as reported by 'opaportinfo'.
Workaround is to change the state to
OPA_LINKDOWN_REASON_LOCAL_MEDIA_NOT_INSTALLED in cable handler.
With cable still removed, 'opaportinfo bounce' should not cause a
state change to Polling, as reported by 'opaportinfo'.
Resolution is to prevent physical state change from Offline->Polling.
Use a macro to mask lower nibble of OPA_LINKDOWN_REASON* as needed
for offline_disabled_reason.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Reported-by: Todd Rimmer <todd.rimmer@intel.com>
Signed-off-by: Bryan Morgan <bryan.c.morgan@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Jubin John [Wed, 3 Feb 2016 22:30:40 +0000 (14:30 -0800)]
staging/rdma/hfi1: Remove srq functionality
srq functionality is now in rdmavt. Remove it from the hfi1 driver.
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Harish Chegondi <harish.chegondi@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Harish Chegondi [Tue, 19 Jan 2016 22:44:17 +0000 (14:44 -0800)]
staging/rdma/hfi1: Remove hfi1_query_qp function
Rely on rvt_query_qp function defined in rdmavt
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Harish Chegondi <harish.chegondi@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Dennis Dalessandro [Tue, 19 Jan 2016 22:44:11 +0000 (14:44 -0800)]
staging/rdma/hfi1: Remove create and free mad agents
Get rid of create and free mad agent from the driver and use rdmavt
version.
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Dennis Dalessandro [Tue, 19 Jan 2016 22:44:06 +0000 (14:44 -0800)]
staging/rdma/hfi1: Use rdmavt device allocation function
No longer do drivers need to call into the IB core to allocate the verbs
device. Use the functionality provided by rdmavt.
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Dennis Dalessandro [Tue, 19 Jan 2016 22:44:01 +0000 (14:44 -0800)]
staging/rdma/hfi1: Clean up register device
Now that rdmavt has solidified in its design we can clean up the driver
specific register device functions. This handles hfi1.
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Dennis Dalessandro [Tue, 19 Jan 2016 22:43:55 +0000 (14:43 -0800)]
staging/rdma/hfi1: Remove post_recv and use rdmavt version
This patch removes the simple post recv function in favor of using rdmavt.
The packet receive processing still lives in the driver though.
Reviewed-by: Harish Chegondi <harish.chegondi@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Dennis Dalessandro [Tue, 19 Jan 2016 22:43:50 +0000 (14:43 -0800)]
staging/rdma/hfi1: Remove destroy qp verb
This removes the destroy qp verbs in favor of using rdmavt.
Reviewed-by: Harish Chegondi <harish.chegondi@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Dennis Dalessandro [Tue, 19 Jan 2016 22:43:44 +0000 (14:43 -0800)]
staging/rdma/hfi1: Remove modify queue pair from hfi1
In addition to removing the modify queue pair verb from hfi1 we also
remove ancillary functions which existed only for modify queue pair and
are also already present in hfi1.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Dennis Dalessandro [Tue, 19 Jan 2016 22:43:39 +0000 (14:43 -0800)]
staging/rdma/hfi1: Remove multicast verbs functions
Multicast is now supported by rdmavt. Remove the verbs multicast functions
and use that.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Harish Chegondi <harish.chegondi@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Dennis Dalessandro [Tue, 19 Jan 2016 22:43:33 +0000 (14:43 -0800)]
staging/rdma/hfi1: Use rdmavt version of post_send
This patch removes the post_send and post_one_send from the hfi1 driver.
The "posting" of sends will be done by rdmavt which will walk a WQE and
queue work. This patch will still provide the capability to schedule that
work as well as kick the progress. These are provided to the rdmavt layer.
Reviewed-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Harish Chegondi <harish.chegondi@intel.com>
Signed-off-by: Edward Mascarenhas <edward.mascarenhas@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Dennis Dalessandro [Tue, 19 Jan 2016 22:43:28 +0000 (14:43 -0800)]
staging/rdma/hfi1: Clean up return handling
Return directly from rvt_resize_cq rather than use a goto/label.
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Harish Chegondi <harish.chegondi@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Dennis Dalessandro [Tue, 19 Jan 2016 22:43:22 +0000 (14:43 -0800)]
staging/rdma/hfi1: Remove CQ data structures and functions from hfi1
The completion queue is not a complex data structure and it can be removed
at the same time as its functions. Unlike the more complicated queue pair
which was done in multiple patches. This single patch removes all traces
of hfi1 specific completeion queues from the hfi1 driver.
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Harish Chegondi <harish.chegondi@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Harish Chegondi [Tue, 19 Jan 2016 22:43:17 +0000 (14:43 -0800)]
staging/rdma/hfi1: Remove query_device function
Removed hfi1 query_device function to use rdmavt rvt_query_device function
The rvt dev info device attributes still need to be filled in by the driver
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Harish Chegondi <harish.chegondi@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Dennis Dalessandro [Tue, 19 Jan 2016 22:43:12 +0000 (14:43 -0800)]
staging/rdma/hfi1: Remove create_qp functionality
Rely on rdmavt to provide queue pair creation.
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Harish Chegondi <harish.chegondi@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Dennis Dalessandro [Tue, 19 Jan 2016 22:43:06 +0000 (14:43 -0800)]
staging/rdma/hfi1: Remove qpdev and qpn table from hfi1
Another change on the way to removing queue pair functionality from
hfi1. This patch removes the private queue pair structure and the table
which holds the queue pair numbers in favor of using what is provided
by rdmavt.
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Harish Chegondi <harish.chegondi@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Dennis Dalessandro [Tue, 19 Jan 2016 22:43:01 +0000 (14:43 -0800)]
staging/rdma/hfi1: Use rdmavt send flags and recv flags
Use the definitions of the s_flags and r_flags which are now in rdmavt.
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Harish Chegondi <harish.chegondi@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>