GitHub/moto-9609/android_kernel_motorola_exynos9610.git
7 years agoRDMA/netlink: Remove netlink clients infrastructure
Leon Romanovsky [Mon, 5 Jun 2017 07:20:11 +0000 (10:20 +0300)]
RDMA/netlink: Remove netlink clients infrastructure

RDMA netlink has a complicated infrastructure for dynamically
registering and de-registering netlink clients to the NETLINK_RDMA
group. The complicated portion of this code is not widely used because
2 of the 3 current clients are statically compiled together with
netlink.c. The infrastructure, therefore, is deemed overkill.

Refactor the code to eliminate the dynamically added clients. Now all
clients are pre-registered in a client array at compile time, and at run
time they merely check-in with the infrastructure to pass their callback
table for inclusion in the pre-sized client array.

This also allows for future cleanups and removal of unneeded code in the
iwcm* netlink handler.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Chien Tin Tung <chien.tin.tung@intel.com>
7 years agoRDMA/core: Add wait/retry version of ibnl_unicast
Ismail, Mustafa [Wed, 28 Jun 2017 14:02:45 +0000 (09:02 -0500)]
RDMA/core: Add wait/retry version of ibnl_unicast

Add a wait/retry version of ibnl_unicast, ibnl_unicast_wait,
and modify ibnl_unicast to not wait/retry.  This eliminates
the undesirable wait for future users of ibnl_unicast.

Change Portmapper calls originating from kernel to user-space
to use ibnl_unicast_wait and take advantage of the wait/retry
logic in netlink_unicast.

Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Chien Tin Tung <chien.tin.tung@intel.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
7 years agoIB/hfi1: Always perform offline transition
Sebastian Sanchez [Sat, 29 Jul 2017 15:44:01 +0000 (08:44 -0700)]
IB/hfi1: Always perform offline transition

Always initiate an offline transition request
when a link down occurs. The firmware will
use this request to confirm that the driver
has seen the link down message. A host version
is set to indicate this driver behavior to the
firmware.

Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Prevent link down request double queuing
Sebastian Sanchez [Sat, 29 Jul 2017 15:43:55 +0000 (08:43 -0700)]
IB/hfi1: Prevent link down request double queuing

When link interrupts occur, multiple link down requests
could be queued up when only one is needed. This could get
the hfi1 out of sync with its link partner during LNI.

Only allow one link down request to be queued at any one time.

Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Create workqueue for link events
Sebastian Sanchez [Sat, 29 Jul 2017 15:43:49 +0000 (08:43 -0700)]
IB/hfi1: Create workqueue for link events

Currently, link down interrupts queue link entries
on a workqueue intended for sending events only.
Create a workqueue for queuing link events.

Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/{rdmavt, hfi1, qib}: Fix panic with post receive and SGE compression
Mike Marciniszyn [Sat, 29 Jul 2017 15:43:43 +0000 (08:43 -0700)]
IB/{rdmavt, hfi1, qib}: Fix panic with post receive and SGE compression

The server side of qperf panics as follows:

[242446.336860] IP: report_bug+0x64/0x10
[242446.341031] PGD 1c0c067
[242446.341032] P4D 1c0c067
[242446.343951] PUD 1c0d063
[242446.346870] PMD 8587ea067
[242446.349788] PTE 800000083e14016
[242446.352901]
[242446.358352] Oops: 0003 [#1] SM
[242446.437919] CPU: 1 PID: 7442 Comm: irq/92-hfi1_0 k Not tainted 4.12.0-mam-asm #1
[242446.446365] Hardware name: Intel Corporation S2600WT2/S2600WT2, BIOS SE5C610.86B.01.01.0018.C4.072020161249 07/20/201
[242446.458397] task: ffff8808392d2b80 task.stack: ffffc9000664000
[242446.465097] RIP: 0010:report_bug+0x64/0x10
[242446.469859] RSP: 0018:ffffc900066439c0 EFLAGS: 0001000
[242446.475784] RAX: ffffffffa06647e4 RBX: ffffffffa06461e1 RCX: 000000000000000
[242446.483840] RDX: 0000000000000907 RSI: ffffffffa0675040 RDI: ffffffffffff740
[242446.491897] RBP: ffffc900066439e0 R08: 0000000000000001 R09: 000000000000025
[242446.499953] R10: ffffffff81a253df R11: 0000000000000133 R12: ffffc90006643b3
[242446.508010] R13: ffffffffa065bbf0 R14: 00000000000001e5 R15: 000000000000000
[242446.516067] FS:  0000000000000000(0000) GS:ffff88085f640000(0000) knlGS:000000000000000
[242446.525191] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003
[242446.531698] CR2: ffffffffa06647ee CR3: 0000000001c09000 CR4: 00000000001406e
[242446.539756] Call Trace
[242446.542582]  fixup_bug+0x2c/0x5
[242446.546277]  do_trap+0x12b/0x18
[242446.549972]  do_error_trap+0x89/0x11
[242446.554171]  ? hfi1_copy_sge+0x271/0x2b0 [hfi1
[242446.559324]  ? ttwu_do_wakeup+0x1e/0x14
[242446.563795]  ? ttwu_do_activate+0x77/0x8
[242446.568363]  do_invalid_op+0x20/0x3
[242446.572448]  invalid_op+0x1e/0x3
[242446.576247] RIP: 0010:hfi1_copy_sge+0x271/0x2b0 [hfi1
[242446.582075] RSP: 0018:ffffc90006643be8 EFLAGS: 0001004
[242446.587999] RAX: 0000000000000000 RBX: ffff88083e0fa240 RCX: 000000000000000
[242446.596058] RDX: 0000000000000000 RSI: ffff880842508000 RDI: ffff88083e0fa24
[242446.604116] RBP: ffffc90006643c28 R08: 0000000000000000 R09: 000000000000000
[242446.612172] R10: ffffc90009473640 R11: 0000000000000133 R12: 000000000000000
[242446.620228] R13: 0000000000000000 R14: 0000000000002000 R15: ffff88084250800
[242446.628293]  ? hfi1_copy_sge+0x1a1/0x2b0 [hfi1
[242446.633449]  hfi1_rc_rcv+0x3da/0x1270 [hfi1
[242446.638312]  ? sc_buffer_alloc+0x113/0x150 [hfi1
[242446.643662]  hfi1_ib_rcv+0x1c9/0x2e0 [hfi1
[242446.648428]  process_receive_ib+0x19a/0x270 [hfi1
[242446.653866]  ? process_rcv_qp_work+0xd2/0x160 [hfi1
[242446.659505]  handle_receive_interrupt_nodma_rtail+0x184/0x2e0 [hfi1
[242446.666693]  ? irq_finalize_oneshot+0x100/0x10
[242446.671846]  receive_context_thread+0x1b/0x140 [hfi1
[242446.677576]  irq_thread_fn+0x1e/0x4
[242446.681659]  irq_thread+0x13c/0x1b
[242446.685646]  ? irq_forced_thread_fn+0x60/0x6
[242446.690604]  kthread+0x112/0x15
[242446.694298]  ? irq_thread_check_affinity+0xe0/0xe
[242446.699738]  ? kthread_park+0x60/0x6
[242446.703919]  ? do_syscall_64+0x67/0x15
[242446.708292]  ret_from_fork+0x25/0x3
[242446.712374] Code: 63 78 04 44 0f b7 70 08 41 89 d0 4c 8d 2c 38 41 83 e0 01 f6 c2 02 74 17 66 45 85 c0 74 11 f6 c2 04 b9 01 00 00 00 75 bb 83 ca 04 <66> 89 50 0a 66 45 85 c0 74 52 0f b6 48 0b 41 0f b7 f6 4d 89 e0
[242446.733527] RIP: report_bug+0x64/0x100 RSP: ffffc900066439c
[242446.739935] CR2: ffffffffa06647e
[242446.743763] ---[ end trace 0e90a20d0aa494f7 ]--

The root cause is that the qib/hfi1 post receive call to rvt_lkey_ok()
doesn't interpret the new return value from rvt_lkey_ok() properly
leading to an mr reference count underrun.

Additionally, remove an unused argument in rvt_sge_adjacent()
aw well as an unneeded incr local in rvt_post_one_wr().

Fixes: Commit 14fe13fcd3af ("IB/rdmavt: Compress adjacent SGEs in rvt_lkey_ok()")
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Disambiguate corruption and uninitialized error cases
Jan Sokolowski [Sat, 29 Jul 2017 15:43:37 +0000 (08:43 -0700)]
IB/hfi1: Disambiguate corruption and uninitialized error cases

The error messages when checksum validation of the platform
configuration fields populated into the ASIC scratch registers fails are
ambiguous. Disambiguate them.

Reviewed-by: Jakub Byczkowski <jakub.byczkowski@intel.com>
Signed-off-by: Easwar Hariharan <easwar.hariharan@intel.com>
Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Only set fd pointer when base context is completely initialized
Michael J. Ruhl [Sat, 29 Jul 2017 15:43:32 +0000 (08:43 -0700)]
IB/hfi1: Only set fd pointer when base context is completely initialized

The allocate_ctxt() function adds the context to the fd data structure.
Since the context is not completely initialized, this can cause confusion
as to whether the context is valid or not.

Move the fd reference from allocate_ctxt() to setup_base_ctxt().
Update the necessary functions to be aware of this move.

Reviewed-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Do not enable disabled port on cable insert
Jan Sokolowski [Sat, 29 Jul 2017 15:43:26 +0000 (08:43 -0700)]
IB/hfi1: Do not enable disabled port on cable insert

Fix issue where a disabled port can be enabled by
inserting a cable. The port should be explicitly
enabled instead.

Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Reviewed-by: Jakub Byczkowski <jakub.byczkowski@intel.com>
Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Harden state transition to Armed and Active
Alex Estrin [Sat, 29 Jul 2017 15:43:20 +0000 (08:43 -0700)]
IB/hfi1: Harden state transition to Armed and Active

There is a window that allows other threads to read state of
'host_link_state' as a new, before the hardware actual state is set.
This patch closes the window by indicating a new state only after
hardware transition is complete.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Alex Estrin <alex.estrin@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Split copy_to_user data copy for better security
Michael J. Ruhl [Mon, 24 Jul 2017 14:46:42 +0000 (07:46 -0700)]
IB/hfi1: Split copy_to_user data copy for better security

A copy_to_user() call assumes that two members of a data structure
are sequential.  Since this may not always be true, separate the copies
to ensure a safe copy.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Verify port data VLs credits on transition to Armed
Alex Estrin [Mon, 24 Jul 2017 14:46:36 +0000 (07:46 -0700)]
IB/hfi1: Verify port data VLs credits on transition to Armed

There is a window where the FM can read the buffer control table
and decide not to program buffers. When a port goes down, the code
clears the table and if it is not programmed, posted SDMA descriptors
will never complete due to no buffer credits.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Alex Estrin <alex.estrin@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Move saving PCI values to a separate function
Bartlomiej Dudek [Mon, 24 Jul 2017 14:46:30 +0000 (07:46 -0700)]
IB/hfi1: Move saving PCI values to a separate function

During PCIe initialization some registers' values from
PCI config space are saved in order to restore them later
(i.e. after reset). Restoring those value is done by a
function called restore_pci_variables, while saving them
is put directly into function hfi1_pcie_ddinit.
Move saving values to a separate function in the image
of restoring functionality.

Reviewed-by: Jakub Byczkowski <jakub.byczkowski@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Bartlomiej Dudek <bartlomiej.dudek@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Fix initialization failure for debug firmware
Byczkowski, Jakub [Mon, 24 Jul 2017 14:46:24 +0000 (07:46 -0700)]
IB/hfi1: Fix initialization failure for debug firmware

Loading debug signed firmware fails if started immediately after
failed attempt to load production firmware. A short delay is
required so add about a 100us delay after RSA check failure.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Jakub Byczkowski <jakub.byczkowski@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Fix code consistency for if/else blocks in chip.c
Jan Sokolowski [Mon, 24 Jul 2017 14:46:18 +0000 (07:46 -0700)]
IB/hfi1: Fix code consistency for if/else blocks in chip.c

Code structure is not consistent for if/else blocks and break
instructions in set_link_state for case HLS_UP_INIT. Physical
state uses break in case of an error and if/else blocks for
logical use cases. These blocks should be implemented consistently.

Reviewed-by: Jakub Byczkowski <jakub.byczkowski@intel.com>
Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Send MAD traps until repressed
Michael J. Ruhl [Mon, 24 Jul 2017 14:46:12 +0000 (07:46 -0700)]
IB/hfi1: Send MAD traps until repressed

A trap should be sent to the FM until the FM sends a repress message.
This is in line with the IBTA 13.4.9.

Add the ability to resend traps until a repress message is received.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Michael N. Henry <michael.n.henry@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Pass the context pointer rather than the index
Michael J. Ruhl [Mon, 24 Jul 2017 14:46:06 +0000 (07:46 -0700)]
IB/hfi1: Pass the context pointer rather than the index

The hfi1_rcvctrl() function receives an index which it then converts
to an rcd.  Since most functions have the rcd, use that instead.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Use context pointer rather than context index
Michael J. Ruhl [Mon, 24 Jul 2017 14:46:01 +0000 (07:46 -0700)]
IB/hfi1: Use context pointer rather than context index

The hfi1_<set|clear>_ctxt_<j|p>key functions take a context index and
look up the context based on that index.

Since the context index is being retrieved from the context, this
doesn't seem optimal.

Pass the context pointer for use, rather than the context index.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Size rcd array index correctly and consistently
Michael J. Ruhl [Mon, 24 Jul 2017 14:45:55 +0000 (07:45 -0700)]
IB/hfi1: Size rcd array index correctly and consistently

The array index for the rcd array is sized several different ways
throughout the code.

Use the user interface size (u16) as the standard size and update the
necessary code to reflect this.

u16 is large enough for the largest amount of supported contexts.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Remove unused user context data members
Michael J. Ruhl [Mon, 24 Jul 2017 14:45:49 +0000 (07:45 -0700)]
IB/hfi1: Remove unused user context data members

Several data members of the user context have become unused over time.
Cleaning them up.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Assign context does not clean up file descriptor correctly on error
Michael J. Ruhl [Mon, 24 Jul 2017 14:45:43 +0000 (07:45 -0700)]
IB/hfi1: Assign context does not clean up file descriptor correctly on error

In the error path for context allocation, the file descriptor pointer
should not point to a context when an error occurs.

Clean up the appropriate references on error.

Fixes: Commit 62239fc6e5545b2e59f83dfbc5db231a81f37a45 ("IB/hfi1: Clean up on context initialization failure")
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Serve the most starved iowait entry first
Kaike Wan [Mon, 24 Jul 2017 14:45:37 +0000 (07:45 -0700)]
IB/hfi1: Serve the most starved iowait entry first

When an egress resource(SDMA descriptors, pio credits) is not available,
a sending thread will be put on the resource's wait queue. When the
resource becomes available again, up to a fixed number of sending threads
can be awakened sequentially and removed from the wait queue, depending
on the number of waiting threads and the number of free resources. Since
each awakened sending thread will send as many packets as possible, it
is highly likely that the first sending thread will consume all the
egress resources. Subsequently, it will be put back to the end of the wait
queue. Depending on the timing when the later sending threads wake up,
they may not be able to send any packet and be again put back to the end
of the wait queue sequentially, right behind the first sending thread.
This starvation cycle continues until some sending threads exceed their
retry limit and consequently fail.

This patch fixes the issue by two simple approaches:
(1) Any starved sending thread will be put to the head of the wait queue
while a served sending thread will be put to the tail;
(2) The most starved sending thread will be served first.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Fix bar0 mapping to use write combining
Mike Marciniszyn [Mon, 24 Jul 2017 14:45:31 +0000 (07:45 -0700)]
IB/hfi1: Fix bar0 mapping to use write combining

When the debugpat kernel boot flag is turned on the following
traces are printed:

[ 1884.793168] x86/PAT: Overlap at 0x90000000-0x92000000
[ 1884.803510] x86/PAT: reserve_memtype added [mem 0x91200000-0x9127ffff],
track uncached-minus, req write-combining, ret uncached-minus
[ 1884.818167] hfi1 0000:05:00.0: hfi1_0: WC Remapped RcvArray:
ffffc9000a980000

The ioremap_wc() clearly is not returning a write combining mapping due
to an overlap where the RcvArray is mapped in a uncached mapping prior
to creating the proposed write combining mapping.

The patch replaces the single base register for uncached CSRs that
used to overlap the RcvArray with two mappings.   One, kregbase1, from the
bar0 up to the RcvArray and another, kregbase2, from the end of the
RcvArray to the pio send buffer space.  A new dd field, base2_start,
is used to convert the zero-based offset in the CSR routines to the
correct kregbase1/kregbase2 mapping.  A single direct write of the
RcvArray CSRs is replaced with hfi1_put_tid() to insure correct access
using the new disjoint mapping.

Additionally, the kregend field is deleted since it is only ever written.

patdebug now shows the RcvArray as write combining:
[   35.688990] x86/PAT: reserve_memtype added [mem 0x91200000-0x9127ffff],
track write-combining, req write-combining, ret write-combining

To insulate from any potential issues with write combining, all
writeq are now flushed in hfi1_put_tid() and rcv_array_wc_fill().

Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Check return values from PCI config API calls
Bartlomiej Dudek [Fri, 30 Jun 2017 20:14:40 +0000 (13:14 -0700)]
IB/hfi1: Check return values from PCI config API calls

Ensure that return values from kernel PCI config access
API calls in HFI driver are checked and react properly if
they are not expected (i.e. not successful).

Reviewed-by: Jakub Byczkowski <jakub.byczkowski@intel.com>
Signed-off-by: Bartlomiej Dudek <bartlomiej.dudek@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hns: include linux/interrupt.h
Arnd Bergmann [Fri, 28 Jul 2017 13:35:22 +0000 (15:35 +0200)]
IB/hns: include linux/interrupt.h

I ran into this build error on linux-next:

drivers/infiniband/hw/hns/hns_roce_eq.c:477:8: error: unknown type name 'irqreturn_t'
 static irqreturn_t hns_roce_msi_x_interrupt(int irq, void *eq_ptr)
        ^~~~~~~~~~~
drivers/infiniband/hw/hns/hns_roce_eq.c: In function 'hns_roce_msi_x_interrupt':
drivers/infiniband/hw/hns/hns_roce_eq.c:485:9: error: implicit declaration of function 'IRQ_RETVAL'; did you mean 'BPF_RVAL'? [-Werror=implicit-function-declaration]
  return IRQ_RETVAL(int_work);

I have bisected this to a seemingly unrelated change that happened
to remove some indirect header inclusions. Simply including the
required header explicitly fixes the build failure.

Fixes: 09c7570480f7 ("xfrm: remove flow cache")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoRDMA/hns: fix build regression
Doug Ledford [Fri, 28 Jul 2017 19:17:18 +0000 (15:17 -0400)]
RDMA/hns: fix build regression

The 0day build system flags implicit includes as errors.  A patch from
Matan Barak to allow hns_roce, an aarch64 specific RDMA driver, to be
built on other arches, but it resulted in build regressions.  The
problem is that hns_roce_device.h needs a definition for __raw_writeq
but did not have an include to provide it.  Add <linux/io.h> as an
include to resolve the issue.

Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hns: fix semicolon.cocci warnings
kbuild test robot [Tue, 25 Jul 2017 05:36:25 +0000 (13:36 +0800)]
IB/hns: fix semicolon.cocci warnings

drivers/infiniband/hw/hns/hns_roce_eq.c:295:3-4: Unneeded semicolon

 Remove unneeded semicolon.

Generated by: scripts/coccinelle/misc/semicolon.cocci

Fixes: 7d1b6a678e0b ("IB/hns: Support compile test for hns RoCE driver")
CC: Matan Barak <matanb@mellanox.com>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hns: fix returnvar.cocci warnings
kbuild test robot [Tue, 25 Jul 2017 05:36:25 +0000 (13:36 +0800)]
IB/hns: fix returnvar.cocci warnings

drivers/infiniband/hw/hns/hns_roce_hw_v1.c:2026:5-8: Unneeded variable: "ret". Return "0" on line 2046

 Remove unneeded variable used to store return value.

Generated by: scripts/coccinelle/misc/returnvar.cocci

Fixes: 7d1b6a678e0b ("IB/hns: Support compile test for hns RoCE driver")
CC: Matan Barak <matanb@mellanox.com>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hns: fix boolreturn.cocci warnings
kbuild test robot [Tue, 25 Jul 2017 05:36:24 +0000 (13:36 +0800)]
IB/hns: fix boolreturn.cocci warnings

drivers/infiniband/hw/hns/hns_roce_qp.c:802:9-10: WARNING: return of 0/1 in function 'hns_roce_wq_overflow' with return type bool

 Return statements in functions returning bool should use
 true/false instead of 1/0.
Generated by: scripts/coccinelle/misc/boolreturn.cocci

Fixes: 7d1b6a678e0b ("IB/hns: Support compile test for hns RoCE driver")
CC: Matan Barak <matanb@mellanox.com>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hns: Support compile test for hns RoCE driver
Matan Barak [Thu, 8 Jun 2017 14:23:50 +0000 (17:23 +0300)]
IB/hns: Support compile test for hns RoCE driver

Compiling the hns RoCE driver requires ARM architecture.
In order to simplify development of IB/core, support
compile test. Add the necessary includes for that too.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/cma: Fix default RoCE type setting
Doug Ledford [Fri, 28 Jul 2017 17:47:24 +0000 (13:47 -0400)]
IB/cma: Fix default RoCE type setting

The initial patch for changing the stack to use RoCEv2 GIDs by default
set the CMA_PREFERRED_ROCE_GID_TYPE to an incorrect value.  Instead of
an absolute value, we needed to set the right bit in a bitmask.  Correct
the default setting so we use RoCEv2 by default.

Fixes: 63a5f483af0e (IB/cma: Set default gid type to RoCEv2)
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoMerge branch 'misc' into k.o/for-next
Doug Ledford [Thu, 27 Jul 2017 13:00:38 +0000 (09:00 -0400)]
Merge branch 'misc' into k.o/for-next

7 years agoRDMA/qedr: notify user application of supported WIDs
Amrani, Ram [Mon, 26 Jun 2017 16:05:06 +0000 (19:05 +0300)]
RDMA/qedr: notify user application of supported WIDs

The number of supported WIDs, if they are supported at all, can be
limited due to resources. Notifying the user space application the
number of available WIDs allows it to utilize them correctly.

Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoRDMA/qedr: notify user application if DPM is supported
Amrani, Ram [Mon, 26 Jun 2017 16:05:05 +0000 (19:05 +0300)]
RDMA/qedr: notify user application if DPM is supported

Direct Packet Mode support may be disabled, e.g, due to limited
resources. Notifying the user application prevents wasting cycles
on attempting to send these kind of packets.

Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoMerge branches 'rxe' and 'mlx' into k.o/for-next
Doug Ledford [Thu, 27 Jul 2017 00:13:33 +0000 (20:13 -0400)]
Merge branches 'rxe' and 'mlx' into k.o/for-next

7 years agonet/mlx5: fix spelling mistake: "alloated" -> "allocated"
Colin Ian King [Tue, 27 Jun 2017 07:40:59 +0000 (08:40 +0100)]
net/mlx5: fix spelling mistake: "alloated" -> "allocated"

Trivial fix to spelling mistake in mlx5_ib_dbg message

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Add receiving queue info to qp_stats
Kaike Wan [Sat, 17 Jun 2017 17:37:34 +0000 (10:37 -0700)]
IB/hfi1: Add receiving queue info to qp_stats

This patch adds qp->s_ack_queue indices and size to qp_stats printout.
This information will provide information about the receiving side.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/mlx4: Expose RSS capabilities
Guy Levi [Tue, 4 Jul 2017 13:24:27 +0000 (16:24 +0300)]
IB/mlx4: Expose RSS capabilities

As a final step to support RSS feature, expose the RSS capabilities in
query device verb.

It includes:
- Max rwq indirection tables.
- Max rwq indirection table size.
- Supported qp types.

Signed-off-by: Guy Levi <guyle@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/mlx4: Add support for RSS QP
Guy Levi [Tue, 4 Jul 2017 13:24:26 +0000 (16:24 +0300)]
IB/mlx4: Add support for RSS QP

Add support to work with a RSS QP by using an indirection table object
upon QP creation. Other related QP verbs (e.g. modify/destroy/query) were
updated as well for that QP mode.

Notes:
- The RX hash properties are supplied as driver private data.
- The RSS QP port is used on the associated WQs in its indirection
  table. Applying different ports during WQ life time is not allowed.
- The expected RSS QP flow is: create, modify(RST->INIT),
  modify(RST->RTR), destroy.

Signed-off-by: Guy Levi <guyle@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/mlx4: Add support for WQ indirection table related verbs
Guy Levi [Tue, 4 Jul 2017 13:24:25 +0000 (16:24 +0300)]
IB/mlx4: Add support for WQ indirection table related verbs

To enable RSS functionality the IB indirection table object (i.e.
ib_rwq_ind_table) should be used.
This patch implements the related verbs as of create and destroy an
indirection table.

In downstream patches the indirection table will be used as part of RSS
QP creation.

Signed-off-by: Guy Levi <guyle@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/mlx4: Add support for WQ related verbs
Guy Levi [Tue, 4 Jul 2017 13:24:24 +0000 (16:24 +0300)]
IB/mlx4: Add support for WQ related verbs

Support create/modify/destroy WQ related verbs.

The base IB object to enable RSS functionality is a WQ (i.e. ib_wq).
This patch implements the related WQ verbs as of create, modify and
destroy.

In downstream patches the WQ will be used as part of an indirection
table (i.e. ib_rwq_ind_table) to enable RSS QP creation.

Notes:
ConnectX-3 hardware requires consecutive WQNs list as receive descriptor
queues for the RSS QP. Hence, the driver manages consecutive ranges lists
per context which the user must respect.
Destroying the WQ does not return its WQN back to its range for
reusing. However, destroying all WQs from the same range releases the
range and in turn releases its WQNs for reusing.

Since the WQ object is not a natural object in the hardware, the driver
implements the WQ by the hardware QP.

As such, the WQ inherits its port from its RSS QP parent upon its
RST->INIT transition and by that time its state is applied to the
hardware.

Signed-off-by: Guy Levi <guyle@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years ago(IB, net)/mlx4: Add resource utilization support
Moshe Shemesh [Wed, 21 Jun 2017 06:29:36 +0000 (09:29 +0300)]
(IB, net)/mlx4: Add resource utilization support

Adding visibility of resource usage of QPs, CQs and counters used by
virtual functions. This feature will be used to give the PF administrator
more data while debugging VF status. Usage info was added to ALLOC_RES
command, to notify the PF if the resource which is being reserved or
allocated for the VF will be used by kernel driver or by user verbs.

Updated reservation and allocation functions of QP, CQ and counter with
additional usage parameter.

Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/mlx4: Add inline-receive support
Maor Gottlieb [Wed, 21 Jun 2017 06:26:28 +0000 (09:26 +0300)]
IB/mlx4: Add inline-receive support

When inline-receive is enabled, the HCA may write received
data into the receive WQE.

Inline-receive is enabled by setting its matching bit in
the QP context and each single-packet message with payload
not exceeding the receive WQE size will be delivered to
the WQE.

The completion report will indicate that the payload was placed to the WQE.

It includes:
1) Return maximum supported size of inline-receive by the hardware
in query_device vendor's data part.
2) Enable the feature when requested by the vendor data input.

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/mlx5: Expose extended error counters
Parav Pandit [Mon, 19 Jun 2017 04:19:37 +0000 (07:19 +0300)]
IB/mlx5: Expose extended error counters

This patch adds below requester and responder side error counters,
which will be exposed by hardware counters interface and are supported
as part of query Q counters command extension.

 +---------------------------+-------------------------------------+
 |      Name                 |           Description               |
 |---------------------------+-------------------------------------|
 |resp_local_length_error    | Number of times responder detected  |
 |                           | local length errors                 |
 |---------------------------+-------------------------------------|
 |resp_cqe_error             | Number of CQEs completed with error |
 |                           | at responder                        |
 |---------------------------+-------------------------------------|
 |req_cqe_error              | Number of CQEs completed with error |
 |                           | at requester                        |
 |---------------------------+-------------------------------------|
 |req_remote_invalid_request | Number of times requester detected  |
 |                           | remote invalid request error        |
 |---------------------------+-------------------------------------|
 |req_remote_access_error    | Number of times requester detected  |
 |                           | remote access error                 |
 |---------------------------+-------------------------------------|
 |resp_remote_access_error   | Number of times responder detected  |
 |                           | remote access error                 |
 |---------------------------+-------------------------------------|
 |resp_cqe_flush_error       | Number of CQEs completed with       |
 |                           | flushed with error at responder     |
 |---------------------------+-------------------------------------|
 |req_cqe_flush_error        | Number of CQEs completed with       |
 |                           | flushed with error at requester     |
 +---------------------------+-------------------------------------+

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/mlx5: Fix existence check for extended address vector
Leon Romanovsky [Mon, 12 Jun 2017 07:36:16 +0000 (10:36 +0300)]
IB/mlx5: Fix existence check for extended address vector

The extended address vector is the highest bit in be32 variable,
but it was compared with the lowest. This patch fixes the endianness
of that check and removes already declared define.

Fixes: 17d2f88f92ce ("IB/mlx5: Add ODP atomics support")
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/mlx5: Fix cached MR allocation flow
Majd Dibbiny [Mon, 12 Jun 2017 07:36:15 +0000 (10:36 +0300)]
IB/mlx5: Fix cached MR allocation flow

When we have a miss in one order of the mkey cache, we try to get
an mkey from a higher order.

We still need to check that the higher order can be used with UMR
before using it. Otherwise, we will get an mkey with 0 entries and
the post send operation that is used to fill it will complete with
the following error:

mlx5_0:dump_cqe:275:(pid 0): dump error cqe
00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000
00000000 0f007806 25000025 49ce59d2

Fixes: 49780d42dfc9 ("IB/mlx5: Expose MR cache for mlx5_ib")
Cc: <stable@vger.kernel.org> # v4.10+
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Reviewed-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/mlx5: Report RX checksum capabilities for IPoIB
Yishai Hadas [Thu, 8 Jun 2017 13:15:11 +0000 (16:15 +0300)]
IB/mlx5: Report RX checksum capabilities for IPoIB

Report RX checksum capabilities when 'ipoib_enhanced_offloads'
capabilities are set and support checksum.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agonet/mlx5: Report enhanced capabilities for IPoIB
Yishai Hadas [Thu, 8 Jun 2017 13:15:10 +0000 (16:15 +0300)]
net/mlx5: Report enhanced capabilities for IPoIB

Report 'ipoib_enhanced_offloads' capabilities from
the core layer, it will be used in the next patch from this series.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/mlx5: Add multicast flow steering support for underlay QP
Yishai Hadas [Thu, 8 Jun 2017 13:15:09 +0000 (16:15 +0300)]
IB/mlx5: Add multicast flow steering support for underlay QP

In order to add multicast flow steering support, there is need
to block the attaching of mcg flow for underlay QP, recognize
multicast IB_FLOW_SPEC_IPV4 based on its IP and enable
creating/destroying flow for IB layer.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/mlx5: Add support for QP with a given source QPN
Yishai Hadas [Thu, 8 Jun 2017 13:15:08 +0000 (16:15 +0300)]
IB/mlx5: Add support for QP with a given source QPN

Allow user space applications to accelerate send and receive
traffic which is typically handled by IPoIB ULP by creating
a UD QP with a given source QPN of the IPoIB UD QP.

UD QP with a given source QPN should basically be similar to
RAW QP from point of view of its created resources.

However,
- Its TIS should point to the source QPN.
- Modify can be done only on its state as the transport attributes
  are managed by its source QP.

This patch manages below:
- Creating/destroying/modifying UD QP with a given source QPN.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/uverbs: Enable QP creation with a given source QP number
Yishai Hadas [Thu, 8 Jun 2017 13:15:07 +0000 (16:15 +0300)]
IB/uverbs: Enable QP creation with a given source QP number

Enable QP creation with a given source QP number, the created QP will
use the source QPN as its wire QP number.

To create such a QP, root privileges (i.e. CAP_NET_RAW) are required
from the user application.

This comes as a pre-patch for downstream patches in this series to
allow user space applications to accelerate traffic which is typically
handled by IPoIB ULP.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/core: Enable QP creation with a given source QP number
Yishai Hadas [Thu, 8 Jun 2017 13:15:06 +0000 (16:15 +0300)]
IB/core: Enable QP creation with a given source QP number

Enable QP creation with a given source QP number.
The created QP will use the source QPN as its wire QP number.

This comes as a pre-patch for downstream patches in this series to
allow user space applications to accelerate traffic which is typically
handled by IPoIB ULP.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/core: Add support for RoCEv2 multicast
Noa Osherovich [Mon, 12 Jun 2017 08:14:04 +0000 (11:14 +0300)]
IB/core: Add support for RoCEv2 multicast

When creating address handle from multicast GID, set MAC according to
the appropriate formula instead of searching for it in the GID table:
- For IPv4 multicast GID use ip_eth_mc_map().
- For IPv6 multicast GID use ipv6_eth_mc_map().

Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Reviewed-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/core: Set RoCEv2 MGID according to spec
Noa Osherovich [Mon, 12 Jun 2017 08:14:03 +0000 (11:14 +0300)]
IB/core: Set RoCEv2 MGID according to spec

RoCEv2 Annex states that for RoCEv2 over IPv4, the corresponding
IPv4 address is encoded into the GID according to the following rule:
GID= :ffff:<IPv4 address>

Remove the 0xff0e prefix for RoCEv2 packets with IPv4 and leave it
zeroed and change rdma_is_multicast_addr() to consider the new logic.

Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Reviewed-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/core: Fix the validations of a multicast LID in attach or detach operations
Noa Osherovich [Mon, 12 Jun 2017 08:14:02 +0000 (11:14 +0300)]
IB/core: Fix the validations of a multicast LID in attach or detach operations

RoCE Annex (A16.9.10/11) declares that during attach (detach) QP to a
multicast group, if the QP is associated with a RoCE port, the
multicast group MLID is unused and is ignored.

During attach or detach multicast, when the QP is associated with a
port, it is enough to check the port's link layer and validate the
LID only if it is Infiniband. Otherwise, avoid validating the
multicast LID.

Fixes: 8561eae60ff9 ("IB/core: For multicast functions, verify that LIDs are multicast LIDs")
Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Reviewed-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/mlx5: Add delay drop configuration and statistics
Maor Gottlieb [Tue, 30 May 2017 07:29:14 +0000 (10:29 +0300)]
IB/mlx5: Add delay drop configuration and statistics

Add debugfs interface for monitor the number of delay drop timeout
events and the number of existing dropless RQs in the system.

In addition add debugfs interface for configuring the global timeout value
which is used in the SET_DELAY_DROP command.

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/mlx5: Add support to dropless RQ
Maor Gottlieb [Tue, 30 May 2017 07:29:13 +0000 (10:29 +0300)]
IB/mlx5: Add support to dropless RQ

RQs that were configured for "delay drop" will prevent packet drops
when their WQEs are depleted.
Marking an RQ to be drop-less is done by setting delay_drop_en in RQ
context using CREATE_RQ command.

Since this feature is globally activated/deactivated by using the
SET_DELAY_DROP command on all the marked RQs, we activated/deactivated
it according to the number of RQs with 'delay_drop' enabled.

When timeout is expired, then the feature is deactivated. Therefore
the driver handles the delay drop timeout event and reactivate it.

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agonet/mlx5: Introduce general notification event
Maor Gottlieb [Tue, 30 May 2017 07:29:12 +0000 (10:29 +0300)]
net/mlx5: Introduce general notification event

When delay drop timeout is expired, the firmware raises
general notification event of DELAY_DROP_TIMEOUT subtype.
In addition the feature is disable so the driver have to
reactivate the timeout.

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agonet/mlx5: Introduce set delay drop command
Maor Gottlieb [Tue, 30 May 2017 07:29:11 +0000 (10:29 +0300)]
net/mlx5: Introduce set delay drop command

Add support to SET_DELAY_DROP command.

This command will be used in downstream patches for delay packet drop.
The timeout value should be indicated by delay_drop_timeout field.
Packet processing will be delayed till timeout value passed or until
more WQEs are posted.

Setting this value to 0 disables the feature.

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/core: Introduce delay drop for a WQ
Maor Gottlieb [Tue, 30 May 2017 07:29:10 +0000 (10:29 +0300)]
IB/core: Introduce delay drop for a WQ

Work queue which is created with IB_WQ_FLAGS_DELAY_DROP won't
cause packet drops when there aren't receive WQEs, but will wait until
posting of receive WQEs or for some period of time that the device
was configured with.

It includes:
 * Add a new creation flag to enable delay drop functionality in a WQ.
 * A new capability was introduced - IB_RAW_PACKET_CAP_DELAY_DROP, which
   is the device's ability to delay packet drops when there aren't receive
   WQEs.

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/mlx5: Restore IB guid/policy for virtual functions
Bodong Wang [Tue, 30 May 2017 07:18:24 +0000 (10:18 +0300)]
IB/mlx5: Restore IB guid/policy for virtual functions

When a user sets port_guid, node_guid or policy of an IB virtual
function, save this information in "struct mlx5_vf_context".

This information will be restored later when pci_resume is called.
To make sure this works, one can use aer-inject to generate PCI
errors on mlx5 devices and verify if relevant fields are restored
after PCI resume.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/mlx5: Add debug control parameters for congestion control
Parav Pandit [Tue, 30 May 2017 07:05:15 +0000 (10:05 +0300)]
IB/mlx5: Add debug control parameters for congestion control

This patch adds debug control parameters for congestion control which
can be read or written through debugfs. They are for reaction point and
notification point nodes.

These control parameters are as below:
 +------------------------------+-----------------------------------------+
 |      Name                    |           Description                   |
 |------------------------------+-----------------------------------------|
 |rp_clamp_tgt_rate             | When set target rate is updated to      |
 |                              | current rate                            |
 |------------------------------+-----------------------------------------|
 |rp_clamp_tgt_rate_ati         | When set update target rate based on    |
 |                              | timer as well                           |
 |------------------------------+-----------------------------------------|
 |rp_time_reset                 | time between rate increase if no        |
 |                              | CNP is received unit in usec            |
 |------------------------------+-----------------------------------------|
 |rp_byte_reset                 | Number of bytes between rate inease if  |
 |                              | no CNP is received                      |
 |------------------------------+-----------------------------------------|
 |rp_threshold                  | Threshold for reaction point rate       |
 |                              | control                                 |
 |------------------------------+-----------------------------------------|
 |rp_ai_rate                    | Rate for target rate, unit in Mbps      |
 |------------------------------+-----------------------------------------|
 |rp_hai_rate                   | Rate for hyper increase state           |
 |                              | unit in Mbps                            |
 |------------------------------+-----------------------------------------|
 |rp_min_dec_fac                | Minimum factor by which the current     |
 |                              | transmit rate can be changed when       |
 |                              | processing a CNP, unit is percerntage   |
 |------------------------------+-----------------------------------------|
 |rp_min_rate                   | Minimum value for rate limit,           |
 |                              | unit in Mbps                            |
 |------------------------------+-----------------------------------------|
 |rp_rate_to_set_on_first_cnp   | Rate that is set when first CNP is      |
 |                              | received, unit is Mbps                  |
 |------------------------------+-----------------------------------------|
 |rp_dce_tcp_g                  | Used to calculate alpha                 |
 |------------------------------+-----------------------------------------|
 |rp_dce_tcp_rtt                | Time between updates of alpha value,    |
 |                              | unit is usec                            |
 |------------------------------+-----------------------------------------|
 |rp_rate_reduce_monitor_period | Minimum time between consecutive rate   |
 |                              | reductions                              |
 |------------------------------+-----------------------------------------|
 |rp_initial_alpha_value        | Initial value of alpha                  |
 |------------------------------+-----------------------------------------|
 |rp_gd                         | When CNP is received, flow rate is      |
 |                              | reduced based on gd, rp_gd is given as  |
 |                              | log2(rp_gd)                             |
 |------------------------------+-----------------------------------------|
 |np_cnp_dscp                   | dscp code point for generated cnp       |
 |------------------------------+-----------------------------------------|
 |np_cnp_prio_mode              | 802.1p priority for generated cnp       |
 |------------------------------+-----------------------------------------|
 |np_cnp_prio                   | cnp priority mode                       |
 +------------------------------+-----------------------------------------+

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/mlx5: Change logic for dispatching IB events for port state
Moni Shoua [Tue, 30 May 2017 06:56:05 +0000 (09:56 +0300)]
IB/mlx5: Change logic for dispatching IB events for port state

The old logic ignored link state. This led to missing IB events like
when link goes down on the switch while admin state is up or to redundant
events like when admin state goes up while link is down.
To fix that, probe the port state on NETDEV events and compare to last
known state to decide if IB events needs to be dispatched.

FIxes: 5ec8c83e3ad3 ("IB/mlx5: Port events in RoCE now rely on netdev events")
Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agonet/mlx5e: Enable local loopback in loopback selftest
Huy Nguyen [Tue, 30 May 2017 06:42:55 +0000 (09:42 +0300)]
net/mlx5e: Enable local loopback in loopback selftest

Before running the ethtool's loopback selftest, we need
to make sure that the local loopback is enabled.

Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/mlx5: Add raw ethernet local loopback support
Huy Nguyen [Tue, 30 May 2017 06:42:54 +0000 (09:42 +0300)]
IB/mlx5: Add raw ethernet local loopback support

Currently, unicast/multicast loopback raw ethernet
(non-RDMA) packets are sent back to the vport.
A unicast loopback packet is the packet with destination
MAC address the same as the source MAC address.
For multicast, the destination MAC address is in the
vport's multicast filter list.

Moreover, the local loopback is not needed if
there is one or none user space context.

After this patch, the raw ethernet unicast and multicast
local loopback are disabled by default. When there is more
than one user space context, the local loopback is enabled.

Note that when local loopback is disabled, raw ethernet
packets are not looped back to the vport and are forwarded
to the next routing level (eswitch, or multihost switch,
or out to the wire depending on the configuration).

Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agonet/mlx5: Add raw ethernet local loopback firmware command
Huy Nguyen [Tue, 30 May 2017 06:42:53 +0000 (09:42 +0300)]
net/mlx5: Add raw ethernet local loopback firmware command

Add support for raw ethernet local loopback firmware command.

Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoRDMA/bnxt_re: Allow posting when QPs are in error
Selvin Xavier [Thu, 29 Jun 2017 19:28:15 +0000 (12:28 -0700)]
RDMA/bnxt_re: Allow posting when QPs are in error

This  patch allows driver to post send and receive
requests on QPs which are in  error state.

Instead of flushing the QP in the context of polling
error CQEs, the QPs will be added to a flush list
maintained per CQ. QP state is moved to error.
QP is added to flush list if the user moves it
to error state using modify_qp also. After polling the HW
CQ in poll_cq routine, this flush list is traversed
and driver completes work requests on each QP in the flush
list, till the budget expires. The QP is moved out of
flush list during QP destroy or during modify_QP to RESET.

When ULPs post Work Requests while QP is in error state,
driver will store the ULP data and then increment the
QP producer s/w index, without ringing doorbell. It then
schedules a worker to invoke the CQ handler since the
interrupts wont be generated from the HW for this request.

Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoRDMA/bnxt_re: Add vlan tag for untagged RoCE traffic when PFC is configured
Kalesh AP [Thu, 29 Jun 2017 19:28:10 +0000 (12:28 -0700)]
RDMA/bnxt_re: Add vlan tag for untagged RoCE traffic when PFC is configured

Current implementation does not program vlan header insertion
in RoCE packet if no vlan is configured. Firmware does not add
prority when there is no vlan tag in the packet. Modify the code
to insert vlan header when PFC is enabled on the interface.

Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoRDMA: Remove useless MODULE_VERSION
Leon Romanovsky [Mon, 26 Jun 2017 05:58:22 +0000 (08:58 +0300)]
RDMA: Remove useless MODULE_VERSION

All modules in drivers/infiniband defined and used MODULE_VERSION, which
was pointless because the kernel version describes their state more accurate
then those arbitrary numbers.

Signed-off-by: Leon Romanovsky <leon@kernel.org>
Acked-by: Sagi Grimbrg <sagi@grimberg.me>
Reviewed-by: Sagi Grimberg <sagi@grimbeg.me>
Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Acked-by: Selvin Xavier <selvin.xavier@broadcom.com>
Acked-by: Ram Amrani <Ram.Amrani@cavium.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Acked-by: Adit Ranadive <aditr@vmware.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/core: Add generic function to extract IB speed from netdev
Yuval Shaia [Wed, 14 Jun 2017 20:13:34 +0000 (23:13 +0300)]
IB/core: Add generic function to extract IB speed from netdev

Logic of retrieving netdev speed from net_device and translating it to
IB speed is implemented in rxe, in usnic and in bnxt drivers.

Define new function which merges all.

Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Christian Benvenuti <benve@cisco.com>
Reviewed-by: Selvin Xavier <selvin.xavier@broadcom.com>
Reviewed-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/usnic: Implement get_netdev hook
Yuval Shaia [Wed, 14 Jun 2017 20:13:33 +0000 (23:13 +0300)]
IB/usnic: Implement get_netdev hook

usnic's get_netdev hook for struct ib_device is missing - add it.

Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Christian Benvenuti <benve@cisco.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/qib: remove duplicate code
Gustavo A. R. Silva [Wed, 7 Jun 2017 20:42:02 +0000 (15:42 -0500)]
IB/qib: remove duplicate code

Remove duplicate code.

Addresses-Coverity-ID: 1226951
Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoRDMA/bnxt_re: Delete unsupported modify_port function
Leon Romanovsky [Thu, 1 Jun 2017 06:24:34 +0000 (09:24 +0300)]
RDMA/bnxt_re: Delete unsupported modify_port function

There is no need to return always zero for function which is not
supported. The IB stack treats uninitialized ib_device->functions as
not implemented.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Acked-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/cma: Set default gid type to RoCEv2
Moni Shoua [Tue, 30 May 2017 06:47:34 +0000 (09:47 +0300)]
IB/cma: Set default gid type to RoCEv2

RoCEv2 is the preferred RDMA protocol for Ethernet link layer because
of its advantages over RoCEv1. For better user experience make it the
default choice for RDMA_CM connections if device/port support it.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/rxe: Constify static rxe_vm_ops
Kamal Heib [Thu, 15 Jun 2017 08:29:07 +0000 (11:29 +0300)]
IB/rxe: Constify static rxe_vm_ops

Constify static rxe_vm_ops that is never modified.

Signed-off-by: Kamal Heib <kamalh@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/rxe: Use __func__ to print function's name
Kamal Heib [Thu, 15 Jun 2017 08:29:06 +0000 (11:29 +0300)]
IB/rxe: Use __func__ to print function's name

Its better to use __func__ to print functions name instead of writing
the name in the print statement.

Signed-off-by: Kamal Heib <kamalh@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/rxe: Use DEVICE_ATTR_RO macro to show parent field
Kamal Heib [Thu, 15 Jun 2017 08:29:05 +0000 (11:29 +0300)]
IB/rxe: Use DEVICE_ATTR_RO macro to show parent field

Use DEVICE_ATTR RO() macro and rename the show function accordingly.

Signed-off-by: Kamal Heib <kamalh@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/rxe: Prefer 'unsigned int' to bare use of 'unsigned'
Kamal Heib [Thu, 15 Jun 2017 08:29:04 +0000 (11:29 +0300)]
IB/rxe: Prefer 'unsigned int' to bare use of 'unsigned'

Signed-off-by: Kamal Heib <kamalh@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/rxe: Use "foo *bar" instead of "foo * bar"
Kamal Heib [Thu, 15 Jun 2017 08:29:03 +0000 (11:29 +0300)]
IB/rxe: Use "foo *bar" instead of "foo * bar"

Signed-off-by: Kamal Heib <kamalh@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoMerge branch 'hfi1' into k.o/for-4.14
Doug Ledford [Mon, 24 Jul 2017 12:33:43 +0000 (08:33 -0400)]
Merge branch 'hfi1' into k.o/for-4.14

7 years agoLinux 4.13-rc2
Linus Torvalds [Sun, 23 Jul 2017 23:15:17 +0000 (16:15 -0700)]
Linux 4.13-rc2

7 years agoProperly alphabetize MAINTAINERS file
Linus Torvalds [Sun, 23 Jul 2017 23:06:21 +0000 (16:06 -0700)]
Properly alphabetize MAINTAINERS file

This adds a perl script to actually parse the MAINTAINERS file, clean up
some whitespace in it, warn about errors in it, and then properly sort
the end result.

My perl-fu is atrocious, so the script has basically been created by
randomly putting various characters in a pile, mixing them around, and
then looking it the end result does anything interesting when used as a
perl script.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoFix up MAINTAINERS file problems
Linus Torvalds [Sun, 23 Jul 2017 22:08:05 +0000 (15:08 -0700)]
Fix up MAINTAINERS file problems

Prepping for scripting the MAINTAINERS file cleanup (and possible split)
showed a couple of cases where the headers for a couple of entries were
bogus.

There's a few different kinds of bogosities:

 - the X-GENE SOC EDAC case was confused and split over two lines

 - there were four entries for "GREYBUS PROTOCOLS DRIVERS" that were all
   different things.

 - the NOKIA N900 CAMERA SUPPORT" was duplicated

all of which were more obvious when you started doing associative arrays
in perl to track these things by the header (so that we can alphabetize
this thing properly, and so that we might split it up by the data too).

Cc: Joe Perches <joe@perches.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoMerge tag 'for-linus-4.13b-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 23 Jul 2017 18:22:45 +0000 (11:22 -0700)]
Merge tag 'for-linus-4.13b-rc2-tag' of git://git./linux/kernel/git/xen/tip

Pull xen fixes from Juergen Gross:
 "Some fixes and cleanups for running under Xen"

* tag 'for-linus-4.13b-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen/balloon: don't online new memory initially
  xen/x86: fix cpu hotplug
  xen/grant-table: log the lack of grants
  xen/x86: Don't BUG on CPU0 offlining

7 years agoxen/balloon: don't online new memory initially
Juergen Gross [Mon, 10 Jul 2017 08:10:45 +0000 (10:10 +0200)]
xen/balloon: don't online new memory initially

When setting up the Xenstore watch for the memory target size the new
watch will fire at once. Don't try to reach the configured target size
by onlining new memory in this case, as the current memory size will
be smaller in almost all cases due to e.g. BIOS reserved pages.

Onlining new memory will lead to more problems e.g. undesired conflicts
with NVMe devices meant to be operated as block devices.

Instead remember the difference between target size and current size
when the watch fires for the first time and apply it to any further
size changes, too.

In order to avoid races between balloon.c and xen-balloon.c init calls
do the xen-balloon.c initialization from balloon.c.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
7 years agoxen/x86: fix cpu hotplug
Juergen Gross [Wed, 5 Jul 2017 14:05:20 +0000 (16:05 +0200)]
xen/x86: fix cpu hotplug

Commit dc6416f1d711eb4c1726e845d653235dcaae12e1 ("xen/x86: Call
cpu_startup_entry(CPUHP_AP_ONLINE_IDLE) from xen_play_dead()")
introduced an error leading to a stack overflow of the idle task when
a cpu was brought offline/online many times: by calling
cpu_startup_entry() instead of returning at the end of xen_play_dead()
do_idle() would be entered again and again.

Don't use cpu_startup_entry(), but cpuhp_online_idle() instead allowing
to return from xen_play_dead().

Cc: <stable@vger.kernel.org> # 4.12
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
7 years agoxen/grant-table: log the lack of grants
Wengang Wang [Tue, 18 Jul 2017 07:40:35 +0000 (09:40 +0200)]
xen/grant-table: log the lack of grants

log a message when we enter this situation:
1) we already allocated the max number of available grants from hypervisor
and
2) we still need more (but the request fails because of 1)).

Sometimes the lack of grants causes IO hangs in xen_blkfront devices.
Adding this log would help debuging.

Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
7 years agoxen/x86: Don't BUG on CPU0 offlining
Vitaly Kuznetsov [Mon, 26 Jun 2017 16:39:30 +0000 (18:39 +0200)]
xen/x86: Don't BUG on CPU0 offlining

CONFIG_BOOTPARAM_HOTPLUG_CPU0 allows to offline CPU0 but Xen HVM guests
BUG() in xen_teardown_timer(). Remove the BUG_ON(), this is probably a
leftover from ancient times when CPU0 hotplug was impossible, it works
just fine for HVM.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Acked-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
7 years agoMerge tag 'hwmon-for-linus-v4.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 22 Jul 2017 16:25:00 +0000 (09:25 -0700)]
Merge tag 'hwmon-for-linus-v4.13-rc2' of git://git./linux/kernel/git/groeck/linux-staging

Pull hwmon fix from Guenter Roeck:
 "Avoid buffer overruns in applesmc driver"

* tag 'hwmon-for-linus-v4.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
  hwmon: (applesmc) Avoid buffer overruns

7 years agoMerge tag 'tty-4.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Linus Torvalds [Sat, 22 Jul 2017 16:00:24 +0000 (09:00 -0700)]
Merge tag 'tty-4.13-rc2' of git://git./linux/kernel/git/gregkh/tty

Pull tty/serial fixes from Greg KH:
 "Here are some small tty and serial driver fixes for 4.13-rc2. Nothing
  huge at all, a revert of a patch that turned out to break things, a
  fix up for a new tty ioctl we added in 4.13-rc1 to get the uapi
  definition correct, and a few minor serial driver fixes for reported
  issues.

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'tty-4.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  tty: Fix TIOCGPTPEER ioctl definition
  tty: hide unused pty_get_peer function
  tty: serial: lpuart: Fix the logic for detecting the 32-bit type UART
  serial: imx: Prevent TX buffer PIO write when a DMA has been started
  Revert "serial: imx-serial - move DMA buffer configuration to DT"
  serial: sh-sci: Uninitialized variables in sysfs files
  serial: st-asc: Potential error pointer dereference

7 years agoMerge tag 'char-misc-4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregk...
Linus Torvalds [Sat, 22 Jul 2017 15:57:24 +0000 (08:57 -0700)]
Merge tag 'char-misc-4.13-rc1' of git://git./linux/kernel/git/gregkh/char-misc

Pull char/misc driver fixes from Greg KH:
 "Here are some small char and misc driver fixes for 4.13-rc2. All fix
  reported problems with 4.13-rc1 or older kernels (like the binder
  fixes). Full details in the shortlog.

  All have been in linux-next with no reported issues"

* tag 'char-misc-4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  w1: omap-hdq: fix error return code in omap_hdq_probe()
  regmap: regmap-w1: Fix build troubles
  w1: Fix slave count on 1-Wire bus (resend)
  mux: mux-core: unregister mux_class in mux_exit()
  mux: remove the Kconfig question for the subsystem
  nvmem: rockchip-efuse: amend compatible rk322x-efuse to rk3228-efuse
  drivers/fsi: fix fsi_slave_mode prototype
  fsi: core: register with postcore_initcall
  thunderbolt: Correct access permissions for active NVM contents
  vmbus: re-enable channel tasklet
  spmi: pmic-arb: Always allocate ppid_to_apid table
  MAINTAINERS: Add entry for SPMI subsystem
  spmi: Include OF based modalias in device uevent
  binder: Use wake up hint for synchronous transactions.
  binder: use group leader instead of open thread
  Revert "android: binder: Sanity check at binder ioctl"

7 years agoMerge tag 'usb-4.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Linus Torvalds [Sat, 22 Jul 2017 15:55:16 +0000 (08:55 -0700)]
Merge tag 'usb-4.13-rc2' of git://git./linux/kernel/git/gregkh/usb

Pull USB fixes from Greg KH:
 "Here are some small USB fixes for 4.13-rc2.

  The usual batch, gadget fixes for reported issues, as well as xhci
  fixes, and a small random collection of other fixes for reported
  issues.

  All have been in linux-next with no reported issues"

* tag 'usb-4.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (25 commits)
  xhci: fix memleak in xhci_run()
  usb: xhci: fix spinlock recursion for USB2 test mode
  xhci: fix 20000ms port resume timeout
  usb: xhci: Issue stop EP command only when the EP state is running
  xhci: Bad Ethernet performance plugged in ASM1042A host
  xhci: Fix NULL pointer dereference when cleaning up streams for removed host
  usb: renesas_usbhs: gadget: disable all eps when the driver stops
  usb: renesas_usbhs: fix usbhsc_resume() for !USBHSF_RUNTIME_PWCTRL
  usb: gadget: udc: renesas_usb3: protect usb3_ep->started in usb3_start_pipen()
  usb: gadget: udc: renesas_usb3: fix zlp transfer by the dmac
  usb: gadget: udc: renesas_usb3: fix free size in renesas_usb3_dma_free_prd()
  usb: gadget: f_uac2: endianness fixes.
  usb: gadget: f_uac1: endianness fixes.
  include: usb: audio: specify exact endiannes of descriptors
  usb: gadget: udc: start_udc() can be static
  usb: dwc2: gadget: On USB RESET reset device address to zero
  usb: storage: return on error to avoid a null pointer dereference
  usb: typec: include linux/device.h in ucsi.h
  USB: cdc-acm: add device-id for quirky printer
  usb: dwc3: gadget: only unmap requests from DMA if mapped
  ...

7 years agoMerge tag 'staging-4.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh...
Linus Torvalds [Sat, 22 Jul 2017 15:53:24 +0000 (08:53 -0700)]
Merge tag 'staging-4.13-rc2' of git://git./linux/kernel/git/gregkh/staging

Pull staging driver fixes from Greg KH:
 "Here are some small staging driver fixes for reported issues for
  4.13-rc2.

  Also in here is a new driver, the virtualbox DRM driver. It's
  stand-alone and got acks from the DRM developers to go in through this
  tree. It's a new thing, but it should be fine for this point in the rc
  cycle due to it being independent.

  All of this has been in linux-next for a while with no reported
  issues"

* tag 'staging-4.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
  staging: rtl8188eu: add TL-WN722N v2 support
  staging: speakup: safely register and unregister ldisc
  staging: speakup: add functions to register and unregister ldisc
  staging: speakup: safely close tty
  staging: sm750fb: avoid conflicting vesafb
  staging: lustre: ko2iblnd: check copy_from_iter/copy_to_iter return code
  staging: vboxvideo: Add vboxvideo to drivers/staging
  staging: sm750fb: fixed a assignment typo
  staging: rtl8188eu: memory leak in rtw_free_cmd_obj()
  staging: vchiq_arm: fix error codes in probe
  staging: comedi: ni_mio_common: fix AO timer off-by-one regression

7 years agoMAINTAINERS: fix alphabetical ordering
Randy Dunlap [Fri, 21 Jul 2017 20:32:27 +0000 (13:32 -0700)]
MAINTAINERS: fix alphabetical ordering

Fix major alphabetic errors.  No attempt to fix items that all begin
with the same word (like ARM, BROADCOM, DRM, EDAC, FREESCALE, INTEL,
OMAP, PCI, SAMSUNG, TI, USB, etc.).

(diffstat +/- is different by one line because TI KEYSTONE MULTICORE
had 2 blank lines after it.)

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoMerge tag 'nfs-for-4.13-2' of git://git.linux-nfs.org/projects/anna/linux-nfs
Linus Torvalds [Fri, 21 Jul 2017 23:26:01 +0000 (16:26 -0700)]
Merge tag 'nfs-for-4.13-2' of git://git.linux-nfs.org/projects/anna/linux-nfs

Pull NFS client bugfixes from Anna Schumaker:
 "Stable bugfix:
   - Fix error reporting regression

  Bugfixes:
   - Fix setting filelayout ds address race
   - Fix subtle access bug when using ACLs
   - Fix setting mnt3_counts array size
   - Fix a couple of pNFS commit races"

* tag 'nfs-for-4.13-2' of git://git.linux-nfs.org/projects/anna/linux-nfs:
  NFS/filelayout: Fix racy setting of fl->dsaddr in filelayout_check_deviceid()
  NFS: Be more careful about mapping file permissions
  NFS: Store the raw NFS access mask in the inode's access cache
  NFSv3: Convert nfs3_proc_access() to use nfs_access_set_mask()
  NFS: Refactor NFS access to kernel access mask calculation
  net/sunrpc/xprt_sock: fix regression in connection error reporting.
  nfs: count correct array for mnt3_counts array size
  Revert commit 722f0b891198 ("pNFS: Don't send COMMITs to the DSes if...")
  pNFS/flexfiles: Handle expired layout segments in ff_layout_initiate_commit()
  NFS: Fix another COMMIT race in pNFS
  NFS: Fix a COMMIT race in pNFS
  mount: copy the port field into the cloned nfs_server structure.
  NFS: Don't run wake_up_bit() when nobody is waiting...
  nfs: add export operations

7 years agoMerge branch 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszer...
Linus Torvalds [Fri, 21 Jul 2017 23:24:22 +0000 (16:24 -0700)]
Merge branch 'overlayfs-linus' of git://git./linux/kernel/git/mszeredi/vfs

Pull overlayfs fixes from Miklos Szeredi:
 "This fixes a crash with SELinux and several other old and new bugs"

* 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
  ovl: check for bad and whiteout index on lookup
  ovl: do not cleanup directory and whiteout index entries
  ovl: fix xattr get and set with selinux
  ovl: remove unneeded check for IS_ERR()
  ovl: fix origin verification of index dir
  ovl: mark parent impure on ovl_link()
  ovl: fix random return value on mount

7 years agoMerge branch 'for-linus' of git://git.kernel.dk/linux-block
Linus Torvalds [Fri, 21 Jul 2017 23:20:05 +0000 (16:20 -0700)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:
 "A small set of fixes for -rc2 - two fixes for BFQ, documentation and
  code, and a removal of an unused variable in nbd. Outside of that, a
  small collection of fixes from the usual crew on the nvme side"

* 'for-linus' of git://git.kernel.dk/linux-block:
  nvmet: don't report 0-bytes in serial number
  nvmet: preserve controller serial number between reboots
  nvmet: Move serial number from controller to subsystem
  nvmet: prefix version configfs file with attr
  nvme-pci: Fix an error handling path in 'nvme_probe()'
  nvme-pci: Remove nvme_setup_prps BUG_ON
  nvme-pci: add another device ID with stripe quirk
  nvmet-fc: fix byte swapping in nvmet_fc_ls_create_association
  nvme: fix byte swapping in the streams code
  nbd: kill unused ret in recv_work
  bfq: dispatch request to prevent queue stalling after the request completion
  bfq: fix typos in comments about B-WF2Q+ algorithm

7 years agoMerge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma
Linus Torvalds [Fri, 21 Jul 2017 21:22:05 +0000 (14:22 -0700)]
Merge tag 'for-linus' of git://git./linux/kernel/git/dledford/rdma

Pull more rdma fixes from Doug Ledford:
 "As per my previous pull request, there were two drivers that each had
  a rather large number of legitimate fixes still to be sent.

  As it turned out, I also missed a reasonably large set of fixes from
  one person across the stack that are all important fixes. All in all,
  the bnxt_re, i40iw, and Dan Carpenter are 3/4 to 2/3rds of this pull
  request.

  There were some other random fixes that I didn't send in the last pull
  request that I added to this one. This catches the rdma stack up to
  the fixes from up to about the beginning of this week. Any more fixes
  I'll wait and batch up later in the -rc cycle. This will give us a
  good base to start with for basing a for-next branch on -rc2.

  Summary:

   - i40iw fixes

   - bnxt_re fixes

   - Dan Carpenter bugfixes across stack

   - ten more random fixes, no more than two from any one person"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (37 commits)
  RDMA/core: Initialize port_num in qp_attr
  RDMA/uverbs: Fix the check for port number
  IB/cma: Fix reference count leak when no ipv4 addresses are set
  RDMA/iser: don't send an rkey if all data is written as immadiate-data
  rxe: fix broken receive queue draining
  RDMA/qedr: Prevent memory overrun in verbs' user responses
  iw_cxgb4: don't use WR keys/addrs for 0 byte reads
  IB/mlx4: Fix CM REQ retries in paravirt mode
  IB/rdmavt: Setting of QP timeout can overflow jiffies computation
  IB/core: Fix sparse warnings
  RDMA/bnxt_re: Fix the value reported for local ack delay
  RDMA/bnxt_re: Report MISSED_EVENTS in req_notify_cq
  RDMA/bnxt_re: Fix return value of poll routine
  RDMA/bnxt_re: Enable atomics only if host bios supports
  RDMA/bnxt_re: Specify RDMA component when allocating stats context
  RDMA/bnxt_re: Fixed the max_rd_atomic support for initiator and destination QP
  RDMA/bnxt_re: Report supported value to IB stack in query_device
  RDMA/bnxt_re: Do not free the ctx_tbl entry if delete GID fails
  RDMA/bnxt_re: Fix WQE Size posted to HW to prevent it from throwing error
  RDMA/bnxt_re: Free doorbell page index (DPI) during dealloc ucontext
  ...

7 years agoMerge tag 'drm-fixes-for-v4.13-rc2' of git://people.freedesktop.org/~airlied/linux
Linus Torvalds [Fri, 21 Jul 2017 21:16:42 +0000 (14:16 -0700)]
Merge tag 'drm-fixes-for-v4.13-rc2' of git://people.freedesktop.org/~airlied/linux

Pull drm fixes from Dave Airlie:
 "A bunch of fixes for rc2: two imx regressions, vc4 fix, dma-buf fix,
  some displayport mst fixes, and an amdkfd fix.

  Nothing too crazy, I assume we just haven't see much rc1 testing yet"

* tag 'drm-fixes-for-v4.13-rc2' of git://people.freedesktop.org/~airlied/linux:
  drm/mst: Avoid processing partially received up/down message transactions
  drm/mst: Avoid dereferencing a NULL mstb in drm_dp_mst_handle_up_req()
  drm/mst: Fix error handling during MST sideband message reception
  drm/imx: parallel-display: Accept drm_of_find_panel_or_bridge failure
  drm/imx: fix typo in ipu_plane_formats[]
  drm/vc4: Fix VBLANK handling in crtc->enable() path
  dma-buf/fence: Avoid use of uninitialised timestamp
  drm/amdgpu: Remove unused field kgd2kfd_shared_resources.num_mec
  drm/radeon: Remove initialization of shared_resources.num_mec
  drm/amdkfd: Remove unused references to shared_resources.num_mec
  drm/amdgpu: Fix KFD oversubscription by tracking queues correctly

7 years agoMerge tag 'trace-v4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt...
Linus Torvalds [Fri, 21 Jul 2017 20:59:51 +0000 (13:59 -0700)]
Merge tag 'trace-v4.13-rc1' of git://git./linux/kernel/git/rostedt/linux-trace

Pull tracing fixes from Steven Rostedt:
 "Three minor updates

   - Use the new GFP_RETRY_MAYFAIL to be more aggressive in allocating
     memory for the ring buffer without causing OOMs

   - Fix a memory leak in adding and removing instances

   - Add __rcu annotation to be able to debug RCU usage of function
     tracing a bit better"

* tag 'trace-v4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  trace: fix the errors caused by incompatible type of RCU variables
  tracing: Fix kmemleak in instance_rmdir
  tracing/ring_buffer: Try harder to allocate