Sagi Grimberg [Thu, 6 Aug 2015 15:32:52 +0000 (18:32 +0300)]
IB/iser: Remove an unneeded print for unaligned memory
We can do it in iser_aligned_data_len instead and
it will save us an argument that is passed to
fall_to_counce_buf just for the print.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Sagi Grimberg [Thu, 6 Aug 2015 15:32:51 +0000 (18:32 +0300)]
IB/iser: Remove a redundant always-false condition
We always call iser_initialize_task_headers() and set
the header tx_sg.lkey to the device mr lkey, so no
point in checking it in iser_create_send_desc().
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Sagi Grimberg [Thu, 6 Aug 2015 15:32:50 +0000 (18:32 +0300)]
IB/iser: Fix possible bogus DMA unmapping
If iser_initialize_task_headers() routine failed before
dma mapping, we should not attempt to unmap in cleanup_task().
Fixes:
7414dde0a6c3a958e (IB/iser: Fix race between iser connection ...)
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Sagi Grimberg [Thu, 6 Aug 2015 15:32:49 +0000 (18:32 +0300)]
IB/iser: Get rid of un-maintained counters
We don't update those anywhere in the code and they
seem pretty useless (no one seem to care about those).
qp_tx_queue_full: We never should get this
fmr_map_not_avail: We can never get to this
eh_abort_cnt: We don't monitor aborts
Go ahead and remove them.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Sagi Grimberg [Thu, 6 Aug 2015 15:32:48 +0000 (18:32 +0300)]
IB/iser: Fix missing return status check in iser_send_data_out
Since commit "IB/iser: Fix race between iser connection teardown..."
iser_initialize_task_headers() might fail, so we need to check that.
Fixes:
7414dde0a6c3a958e (IB/iser: Fix race between iser connection ...)
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Sagi Grimberg [Thu, 6 Aug 2015 15:32:47 +0000 (18:32 +0300)]
IB/iser: Remove '.' from log message
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Sagi Grimberg [Thu, 6 Aug 2015 15:32:46 +0000 (18:32 +0300)]
IB/iser: Change minor assignments and logging prints
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Jenny Falkovich [Thu, 6 Aug 2015 15:32:45 +0000 (18:32 +0300)]
IB/iser: Change some module parameters to be RO
While we're at it, use permission defines instead
of octal values and rearrange a little bit.
Signed-off-by: Jenny Derzhavetz <jennyf@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Kaike Wan [Fri, 14 Aug 2015 12:52:09 +0000 (08:52 -0400)]
IB/sa: Route SA pathrecord query through netlink
This patch routes a SA pathrecord query to netlink first and processes the
response appropriately. If a failure is returned, the request will be sent
through IB. The decision whether to route the request to netlink first is
determined by the presence of a listener for the local service netlink
multicast group. If the user-space local service netlink multicast group
listener is not present, the request will be sent through IB, just like
what is currently being done.
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: John Fleck <john.fleck@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Kaike Wan [Fri, 14 Aug 2015 12:52:08 +0000 (08:52 -0400)]
IB/sa: Allocate SA query with kzalloc
Replace kmalloc with kzalloc so that all uninitialized fields in SA query
will be zero-ed out to avoid unintentional consequence. This prepares the
SA query structure to accept new fields in the future.
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: John Fleck <john.fleck@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Kaike Wan [Fri, 14 Aug 2015 12:52:07 +0000 (08:52 -0400)]
IB/core: Add rdma netlink helper functions
This patch adds a function to check if listeners for a netlink multicast
group are present. It also adds a function to receive netlink response
messages.
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: John Fleck <john.fleck@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Kaike Wan [Fri, 14 Aug 2015 12:52:06 +0000 (08:52 -0400)]
IB/netlink: Add defines for local service requests through netlink
This patch adds netlink defines for local service client, local service
group, local service operations, and related attributes.
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: John Fleck <john.fleck@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Bart Van Assche [Fri, 14 Aug 2015 18:01:09 +0000 (11:01 -0700)]
IB/srp: Stop the scsi_eh_<n> and scsi_tmf_<n> threads if login fails
scsi_host_alloc() not only allocates memory for a SCSI host but also
creates the scsi_eh_<n> kernel thread and the scsi_tmf_<n> workqueue.
Stop these threads if login fails by calling scsi_host_put().
Reported-by: Konstantin Krotov <kkv@clodo.ru>
Fixes:
fb49c8bbaae7 ("Remove an extraneous scsi_host_put() from an error path")
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Sagi Grimberg <sagig@mellanox.com>
Cc: Sebastian Parschauer <sebastian.riemer@profitbricks.com>
Cc: <stable@vger.kernel.org> #v3.19
Signed-off-by: Doug Ledford <dledford@redhat.com>
Bart Van Assche [Fri, 31 Jul 2015 21:13:52 +0000 (14:13 -0700)]
IB/srp: Bump driver version and release date
Since version 1.0 e.g. scsi-mq has been added. Since this is
a significant change, bump the driver version and release date.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Sagi Grimberg <sagig@mellanox.com>
Cc: Sebastian Parschauer <sebastian.riemer@profitbricks.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Bart Van Assche [Fri, 31 Jul 2015 21:13:22 +0000 (14:13 -0700)]
IB/srp: Handle partial connection success correctly
Avoid that the following kernel warning is reported if the SRP
target system accepts fewer channels per connection than what
was requested by the initiator system:
WARNING: at drivers/infiniband/ulp/srp/ib_srp.c:617 srp_destroy_qp+0xb1/0x120 [ib_srp]()
Call Trace:
[<
ffffffff8105d67f>] warn_slowpath_common+0x7f/0xc0
[<
ffffffff8105d6da>] warn_slowpath_null+0x1a/0x20
[<
ffffffffa05419e1>] srp_destroy_qp+0xb1/0x120 [ib_srp]
[<
ffffffffa05445fb>] srp_create_ch_ib+0x19b/0x420 [ib_srp]
[<
ffffffffa0545257>] srp_create_target+0x7d7/0xa94 [ib_srp]
[<
ffffffff8138dac0>] dev_attr_store+0x20/0x30
[<
ffffffff812079ef>] sysfs_write_file+0xef/0x170
[<
ffffffff81191fc4>] vfs_write+0xb4/0x130
[<
ffffffff8119276f>] sys_write+0x5f/0xa0
[<
ffffffff815a0a59>] system_call_fastpath+0x16/0x1b
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Sagi Grimberg <sagig@mellanox.com>
Cc: Sebastian Parschauer <sebastian.riemer@profitbricks.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: stable@vger.kernel.org
Signed-off-by: Doug Ledford <dledford@redhat.com>
Bart Van Assche [Fri, 31 Jul 2015 21:12:48 +0000 (14:12 -0700)]
IB/srp: Constify a function argument
This patch does not change any functionality.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Sagi Grimberg <sagig@mellanox.com>
Cc: Sebastian Parschauer <sebastian.riemer@profitbricks.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Ariel Nahum [Sun, 9 Aug 2015 08:16:27 +0000 (11:16 +0300)]
IB/mlx4: Fix incorrect cq flushing in error state
When handling a device internal error, the driver is responsible to
drain the completion queue with flush errors.
In case a completion queue was assigned to multiple send queues, the
driver iterates over the send queues and generates flush errors of
inflight wqes. The driver must correctly pass the wc array with an
offset as a result of the previous send queue iteration. Not doing so
will overwrite previously set completions and return a wrong number
of polled completions which includes ones which were not correctly set.
Fixes:
35f05dabf95a (IB/mlx4: Reset flow support for IB kernel ULPs)
Signed-off-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Cc: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Noa Osherovich [Thu, 30 Jul 2015 14:34:24 +0000 (17:34 +0300)]
IB/mlx4: Use correct SL on AH query under RoCE
The mlx4 IB driver implementation for ib_query_ah used a wrong offset
(28 instead of 29) when link type is Ethernet. Fixed to use the correct one.
Fixes:
fa417f7b520e ('IB/mlx4: Add support for IBoE')
Signed-off-by: Shani Michaeli <shanim@mellanox.com>
Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Jack Morgenstein [Thu, 30 Jul 2015 14:34:23 +0000 (17:34 +0300)]
IB/mlx4: Forbid using sysfs to change RoCE pkeys
The pkey mapping for RoCE must remain the default mapping:
VFs:
virtual index 0 = mapped to real index 0 (0xFFFF)
All others indices: mapped to a real pkey index containing an
invalid pkey.
PF:
virtual index i = real index i.
Don't allow users to change these mappings using files found in
sysfs.
Fixes:
c1e7e466120b ('IB/mlx4: Add iov directory in sysfs under the ib device')
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Jack Morgenstein [Thu, 30 Jul 2015 14:34:22 +0000 (17:34 +0300)]
IB/mlx4: Demote mcg message from warning to debug
The mcg "too many pending requests" warning message fills the log
when OpenSM is downed. Demote the message from warning level to
debug level.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Jack Morgenstein [Thu, 30 Jul 2015 14:34:21 +0000 (17:34 +0300)]
IB/mlx4: Fix potential deadlock when sending mad to wire
send_mad_to_wire takes the same spinlock that is taken in
the interrupt context. Therefore, it needs irqsave/restore.
Fixes:
b9c5d6a64358 ('IB/mlx4: Add multicast group (MCG) paravirtualization for SR-IOV')
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Doug Ledford [Sat, 15 Aug 2015 14:16:14 +0000 (10:16 -0400)]
IB/core: Remove needless bracketization
Signed-off-by: Doug Ledford <dledford@redhat.com>
Somnath Kotur [Thu, 30 Jul 2015 15:33:31 +0000 (18:33 +0300)]
RDMA/ocrdma: Incorporate the moving of GID Table mgmt to IB/Core
1.Change query_gid hook to return value from IB/Core GID
management APIs.
2.Get rid of all the netdev notifier chain subscription code as well
as maintenance of SGID Table in memory.
3.Implement get_netdev hook in driver.
Signed-off-by: Somnath Kotur <somnath.kotur@avagotech.com>
Signed-off-by: Devesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Moni Shoua [Thu, 30 Jul 2015 15:33:30 +0000 (18:33 +0300)]
IB/mlx4: Replace mechanism for RoCE GID management
Manage RoCE gid table with logic in IB/core, which is common to all
vendors, and remove the mechanism from the mlx4 IB driver.
Since management of the GID cache may lead to index mismatch with the
hardware GID table, a translation between indexes is required when
modifying a QP or creating an address handle.
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Moni Shoua [Thu, 30 Jul 2015 15:33:29 +0000 (18:33 +0300)]
IB/mlx4: Implement ib_device callbacks
get_netdev: get the net_device on the physical port of the IB transport port. In
port aggregation mode it is required to return the netdev of the active port.
modify_gid: note for a change in the RoCE gid cache. Handle this by writing to
the harsware GID table. It is possible that indexes in cahce and hardware tables
won't match so a translation is required when modifying a QP or creating an
address handle.
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Moni Shoua [Thu, 30 Jul 2015 15:33:28 +0000 (18:33 +0300)]
net/mlx4: Postpone the registration of net_device
The mlx4 network driver was registered in the context of the 'add'
function of the core driver (called when HW should be registered).
This makes the netdev event NETDEV_REGISTER to be sent in a context
where the answer to get_protocol_dev() callback returns NULL. This may
be confusing to listeners of netdev events.
This patch is a preparation to the patch that implements the
get_netdev() callback in the IB/mlx4 driver.
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Matan Barak [Thu, 30 Jul 2015 15:33:27 +0000 (18:33 +0300)]
IB/core: Add RoCE table bonding support
Handling bonding and other devices require us to all all GIDs of the
net-devices which are upper-devices of the RoCE port related
net-device.
Active-backup configurations imposes even more challenges as the
default GID should only be set on the active devices (this is
necessary as otherwise the same MAC could be used for several
slaves and thus several slaves will have identical GIDs).
Managing these configurations are done by listening to:
(a) NETDEV_CHANGEUPPER event
(1) if a related net-device is linked, delete all inactive
slaves default GIDs and add the upper device GIDs.
(2) if a related net-device is unlinked, delete all upper GIDs
and add the default GIDs.
(b) NETDEV_BONDING_FAILOVER:
(1) delete the bond GIDs from inactive slaves
(2) delete the inactive slave's default GIDs
(3) Add the bond GIDs to the active slave.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Dan Carpenter [Tue, 18 Aug 2015 09:22:10 +0000 (12:22 +0300)]
IB/core: missing curly braces in ib_find_gid()
Smatch says that, based on the indenting, we should probably add curly
braces here.
Fixes:
03db3a2d81e6 ('IB/core: Add RoCE GID table management')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Matan Barak [Thu, 30 Jul 2015 15:33:26 +0000 (18:33 +0300)]
IB/core: Add RoCE GID table management
RoCE GIDs are based on IP addresses configured on Ethernet net-devices
which relate to the RDMA (RoCE) device port.
Currently, each of the low-level drivers that support RoCE (ocrdma,
mlx4) manages its own RoCE port GID table. As there's nothing which is
essentially vendor specific, we generalize that, and enhance the RDMA
core GID cache to do this job.
In order to populate the GID table, we listen for events:
(a) netdev up/down/change_addr events - if a netdev is built onto
our RoCE device, we need to add/delete its IPs. This involves
adding all GIDs related to this ndev, add default GIDs, etc.
(b) inet events - add new GIDs (according to the IP addresses)
to the table.
For programming the port RoCE GID table, providers must implement
the add_gid and del_gid callbacks.
RoCE GID management requires us to state the associated net_device
alongside the GID. This information is necessary in order to manage
the GID table. For example, when a net_device is removed, its
associated GIDs need to be removed as well.
RoCE mandates generating a default GID for each port, based on the
related net-device's IPv6 link local. In contrast to the GID based on
the regular IPv6 link-local (as we generate GID per IP address),
the default GID is also available when the net device is down (in
order to support loopback).
Locking is done as follows:
The patch modify the GID table code both for new RoCE drivers
implementing the add_gid/del_gid callbacks and for current RoCE and
IB drivers that do not. The flows for updating the table are
different, so the locking requirements are too.
While updating RoCE GID table, protection against multiple writers is
achieved via mutex_lock(&table->lock). Since writing to a table
requires us to find an entry (possible a free entry) in the table and
then modify it, this mutex protects both the find_gid and write_gid
ensuring the atomicity of the action.
Each entry in the GID cache is protected by rwlock. In RoCE, writing
(usually results from netdev notifier) involves invoking the vendor's
add_gid and del_gid callbacks, which could sleep.
Therefore, an invalid flag is added for each entry. Updates for RoCE are
done via a workqueue, thus sleeping is permitted.
In IB, updates are done in write_lock_irq(&device->cache.lock), thus
write_gid isn't allowed to sleep and add_gid/del_gid are not called.
When passing net-device into/out-of the GID cache, the device
is always passed held (dev_hold).
The code uses a single work item for updating all RDMA devices,
following a netdev or inet notifier.
The patch moves the cache from being a client (which was incorrect,
as the cache is part of the IB infrastructure) to being explicitly
initialized/freed when a device is registered/removed.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Jason Gunthorpe [Tue, 4 Aug 2015 21:23:34 +0000 (15:23 -0600)]
IB/core: Make ib_alloc_device init the kobject
This gets rid of the weird in-between state where struct ib_device
was allocated but the kobject didn't work.
Consequently ib_device_release is now guaranteed to be called in
all situations and we needn't duplicate its kfrees on error paths.
Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Matan Barak [Thu, 30 Jul 2015 15:33:24 +0000 (18:33 +0300)]
net/bonding: Export bond_option_active_slave_get_rcu
Some consumers of the netdev events API would like to know who is the
active slave when a NETDEV_CHANGEUPPER or NETDEV_BONDING_FAILOVER
events occur. For example, when managing RoCE GIDs, GIDs based on the
bond's ips should only be set on the port which corresponds to active
slave netdevice.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Matan Barak [Thu, 30 Jul 2015 15:33:23 +0000 (18:33 +0300)]
net: Add info for NETDEV_CHANGEUPPER event
Some consumers of NETDEV_CHANGEUPPER event would like to know which
upper device was linked/unlinked and what operation was carried.
Add information in the notifier info block for that purpose.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Matan Barak [Thu, 30 Jul 2015 15:33:22 +0000 (18:33 +0300)]
net/ipv6: Export addrconf_ifid_eui48
For loopback purposes, RoCE devices should have a default GID in the
port GID table, even when the interface is down. In order to do so,
we use the IPv6 link local address which would have been genenrated
for the related Ethernet netdevice when it goes up as a default GID.
addrconf_ifid_eui48 is used to gernerate this address, export it.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Sagi Grimberg [Thu, 30 Jul 2015 07:32:48 +0000 (10:32 +0300)]
IB/core: Drop ib_alloc_fast_reg_mr
Fully replaced by a more generic and suitable
ib_alloc_mr.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Mike Marciniszyn [Fri, 7 Aug 2015 14:51:25 +0000 (10:51 -0400)]
IB/hfi1: Support ib_alloc_mr verb
Ported from upstream qib commit
68c02e232b8a ("qib: Support ib_alloc_mr verb")
Tested-by: Jubin John <jubin.john@intel.com>
Reviewed-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Sagi Grimberg [Thu, 30 Jul 2015 07:32:47 +0000 (10:32 +0300)]
qib: Support ib_alloc_mr verb
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Sagi Grimberg [Thu, 30 Jul 2015 07:32:46 +0000 (10:32 +0300)]
nes: Support ib_alloc_mr verb
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Sagi Grimberg [Thu, 30 Jul 2015 07:32:45 +0000 (10:32 +0300)]
cxgb3: Support ib_alloc_mr verb
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Sagi Grimberg [Thu, 30 Jul 2015 07:32:44 +0000 (10:32 +0300)]
iw_cxgb4: Support ib_alloc_mr verb
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Sagi Grimberg [Thu, 30 Jul 2015 07:32:43 +0000 (10:32 +0300)]
ocrdma: Support ib_alloc_mr verb
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Sagi Grimberg [Thu, 30 Jul 2015 07:32:42 +0000 (10:32 +0300)]
mlx4: Support ib_alloc_mr verb
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Sagi Grimberg [Thu, 30 Jul 2015 07:32:41 +0000 (10:32 +0300)]
mlx5: Drop mlx5_ib_alloc_fast_reg_mr
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Sagi Grimberg [Thu, 30 Jul 2015 07:32:40 +0000 (10:32 +0300)]
RDS: Convert to ib_alloc_mr
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Steve Wise [Fri, 7 Aug 2015 16:11:20 +0000 (11:11 -0500)]
svcrdma: limit FRMR page list lengths to device max
Svcrdma was incorrectly allocating fastreg MRs and page lists using
RPCSVC_MAXPAGES, which can exceed the device capabilities. So limit
the depth to the minimum of RPCSVC_MAXPAGES and xprt->sc_frmr_pg_list_len.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Sagi Grimberg [Thu, 30 Jul 2015 07:32:39 +0000 (10:32 +0300)]
xprtrdma, svcrdma: Convert to ib_alloc_mr
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Sagi Grimberg [Thu, 30 Jul 2015 07:32:38 +0000 (10:32 +0300)]
IB/srp: Convert to ib_alloc_mr
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Sagi Grimberg [Thu, 30 Jul 2015 07:32:37 +0000 (10:32 +0300)]
iser-target: Convert to ib_alloc_mr
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Sagi Grimberg [Thu, 30 Jul 2015 07:32:36 +0000 (10:32 +0300)]
IB/iser: Convert to ib_alloc_mr
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Sagi Grimberg [Thu, 30 Jul 2015 07:32:35 +0000 (10:32 +0300)]
IB: Modify ib_create_mr API
Use ib_alloc_mr with specific parameters.
Change the existing callers.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Sagi Grimberg [Thu, 30 Jul 2015 07:32:34 +0000 (10:32 +0300)]
IB/core: Get rid of redundant verb ib_destroy_mr
This was added in a thought of uniting all mr allocation
and deallocation routines but the fact is we have a single
deallocation routine already, ib_dereg_mr.
And, move mlx5_ib_destroy_mr specific logic into mlx5_ib_dereg_mr
(includes only signature stuff for now).
And, fixup the only callers (iser/isert) accordingly.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Haggai Eran [Thu, 27 Aug 2015 12:55:15 +0000 (15:55 +0300)]
IB/cma: Fix net_dev reference leak with failed requests
When no matching listening ID is found for a given request, the net_dev
that was used to find the request isn't released.
Fixes:
0b3ca768fcb0 ("IB/cma: Use found net_dev for passive connections")
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Haggai Eran [Thu, 30 Jul 2015 14:50:26 +0000 (17:50 +0300)]
IB/cm: Remove compare_data checks
Now that there are no ib_cm clients using the compare_data feature for
matching IB CM requests' private data, remove the compare_data parameter of
ib_cm_listen and remove the code implementing the feature.
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Haggai Eran [Thu, 30 Jul 2015 14:50:25 +0000 (17:50 +0300)]
IB/cma: Share ib_cm_ids between rdma_cm_ids
Use ib_cm_insert_listen to create listening IB CM IDs or share existing
ones if needed. When given a request on a specific CM ID, the code now
matches the request to the RDMA CM ID based on the request parameters, so
it no longer needs to rely on the ib_cm's private data matching
capabilities.
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Haggai Eran [Thu, 30 Jul 2015 14:50:24 +0000 (17:50 +0300)]
IB/cma: Use found net_dev for passive connections
When receiving a new connection in cma_req_handler, we actually already
know the net_dev that is used for the connection's creation. Instead of
calling cma_translate_addr to resolve the new connection id's source
address, just use the net_dev that was found.
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Haggai Eran [Thu, 30 Jul 2015 14:50:23 +0000 (17:50 +0300)]
IB/cma: Validate routing of incoming requests
Pass incoming request parameters through the relevant IPv4/IPv6 routing
tables and make sure the network stack is configured to handle such
requests.
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Haggai Eran [Thu, 30 Jul 2015 14:50:22 +0000 (17:50 +0300)]
IB/cma: Add net_dev and private data checks to RDMA CM
Instead of relying on a the ib_cm module to check an incoming CM request's
private data header, add these checks to the RDMA CM module. This allows a
following patch to to clean up the ib_cm interface and remove the code that
looks into the private headers. It will also allow supporting namespaces in
RDMA CM by making these checks namespace aware later on.
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Haggai Eran [Thu, 30 Jul 2015 14:50:21 +0000 (17:50 +0300)]
IB/cm: Expose BTH P_Key in CM and SIDR request events
The rdma_cm module will later use the P_Key from the BTH to de-mux
requests.
See discussion at:
http://www.spinics.net/lists/netdev/msg336067.html
Cc: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Cc: Liran Liss <liranl@mellanox.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Haggai Eran [Thu, 30 Jul 2015 14:50:20 +0000 (17:50 +0300)]
IB/cma: Helper functions to access port space IDRs
Add helper functions to access the IDRs by port-space and port number.
Pass around the port-space enum in cma.c instead of using pointers to
port-space IDRs.
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Yotam Kenneth <yotamke@mellanox.com>
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Guy Shapiro <guysh@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Haggai Eran [Thu, 30 Jul 2015 14:50:19 +0000 (17:50 +0300)]
IB/cma: Refactor RDMA IP CM private-data parsing code
When receiving a connection request, rdma_cm needs to associate the request
with a network device, in order to disambiguate requests. To do this, it
needs to know the request's destination IP. For this the module needs to
allow getting this information from the private data in the request packet,
instead of relying on the information already being in the listening RDMA
CM ID.
When creating a new incoming connection ID, the code in
cma_save_ip{4,6}_info can no longer rely on the listener's private data to
find the port number, so it reads it from the requested service ID.
Signed-off-by: Guy Shapiro <guysh@mellanox.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Yotam Kenneth <yotamke@mellanox.com>
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Haggai Eran [Thu, 30 Jul 2015 14:50:18 +0000 (17:50 +0300)]
IB/cm: Share listening CM IDs
Enabling network namespaces for RDMA CM will allow processes on different
namespaces to listen on the same port. In order to leave namespace support
out of the CM layer, this requires that multiple RDMA CM IDs will be able
to share a single CM ID.
This patch adds infrastructure to retrieve an existing listening ib_cm_id,
based on its device and service ID, or create a new one if one does not
already exist. It also adds a reference count for such instances
(cm_id_private.listen_sharecount), and prevents cm_destroy_id from
destroying a CM if it is still shared. See the relevant discussion [1].
[1] Re: [PATCH v3 for-next 05/13] IB/cm: Reference count ib_cm_ids
http://www.spinics.net/lists/netdev/msg328860.html
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Haggai Eran [Thu, 30 Jul 2015 14:50:17 +0000 (17:50 +0300)]
IB/cm: Expose service ID in request events
Expose the service ID on an incoming CM or SIDR request to the event
handler. This will allow the RDMA CM module to de-multiplex connection
requests based on the information encoded in the service ID.
Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Guy Shapiro [Thu, 30 Jul 2015 14:50:16 +0000 (17:50 +0300)]
IB/ipoib: Return IPoIB devices matching connection parameters
Implement the get_net_device_by_port_pkey_ip callback that returns network
device to ib_core according to connection parameters. Check the ipoib
device and iterate over all child devices to look for a match.
For each IPoIB device we iterate through all upper devices when searching
for a matching IP, in order to support bonding.
Signed-off-by: Guy Shapiro <guysh@mellanox.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Yotam Kenneth <yotamke@mellanox.com>
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Yotam Kenneth [Thu, 30 Jul 2015 14:50:15 +0000 (17:50 +0300)]
IB/core: Find the network device matching connection parameters
In the case of IPoIB, and maybe in other cases, the network device is
managed by an upper-layer protocol (ULP). In order to expose this
network device to other users of the IB device, let ULPs implement
a callback that returns network device according to connection parameters.
The IB device and port, together with the P_Key and the GID should
be enough to uniquely identify the ULP net device. However, in current
kernels there can be multiple IPoIB interfaces created with the same GID.
Furthermore, such configuration may be desireable to support ipvlan-like
configurations for RDMA CM with IPoIB. To resolve the device in these
cases the code will also take the IP address as an additional input.
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Yotam Kenneth <yotamke@mellanox.com>
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Guy Shapiro <guysh@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Haggai Eran [Thu, 30 Jul 2015 14:50:14 +0000 (17:50 +0300)]
IB/core: lock client data with lists_rwsem
An ib_client callback that is called with the lists_rwsem locked only for
read is protected from changes to the IB client lists, but not from
ib_unregister_device() freeing its client data. This is because
ib_unregister_device() will remove the device from the device list with
lists_rwsem locked for write, but perform the rest of the cleanup,
including the call to remove() without that lock.
Mark client data that is undergoing de-registration with a new going_down
flag in the client data context. Lock the client data list with lists_rwsem
for write in addition to using the spinlock, so that functions calling the
callback would be able to lock only lists_rwsem for read and let callbacks
sleep.
Since ib_unregister_client() now marks the client data context, no need for
remove() to search the context again, so pass the client data directly to
remove() callbacks.
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Haggai Eran [Thu, 30 Jul 2015 14:50:13 +0000 (17:50 +0300)]
IB/core: Add rwsem to allow reading device list or client list
Currently the RDMA subsystem's device list and client list are protected by
a single mutex. This prevents adding user-facing APIs that iterate these
lists, since using them may cause a deadlock. The patch attempts to solve
this problem by adding a read-write semaphore to protect the lists. Readers
now don't need the mutex, and are safe just by read-locking the semaphore.
The ib_register_device, ib_register_client, ib_unregister_device, and
ib_unregister_client functions are modified to lock the semaphore for write
during their respective list modification. Also, in order to make sure
client callbacks are called only between add() and remove() calls, the code
is changed to only add items to the lists after the add() calls and remove
from the lists before the remove() calls.
This patch attempts to solve a similar need [1] that was seen in the RoCE
v2 patch series.
[1] http://www.spinics.net/lists/linux-rdma/msg24733.html
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Cc: Matan Barak <matanb@mellanox.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Steve Wise [Mon, 27 Jul 2015 23:10:18 +0000 (18:10 -0500)]
RDMA/Core: remove rdma_cap_read_multi_sge() helper
This functionality already exists via the max_sge_rd
device capability.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Steve Wise [Mon, 27 Jul 2015 23:10:12 +0000 (18:10 -0500)]
svcrdma: Use max_sge_rd for destination read depths
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Steve Wise [Mon, 27 Jul 2015 23:10:07 +0000 (18:10 -0500)]
ipath,qib: Expose max_sge_rd correctly
Applications must not assume that max_sge and max_sge_rd are the same,
Hence expose max_sge_rd correctly as well.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Acked-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Sagi Grimberg [Mon, 27 Jul 2015 23:10:01 +0000 (18:10 -0500)]
mlx4, mlx5, mthca: Expose max_sge_rd correctly
Applications must not assume that max_sge and max_sge_rd are the same,
Hence expose max_sge_rd correctly as well.
Reported-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Jeff Becker [Fri, 21 Aug 2015 19:26:22 +0000 (12:26 -0700)]
staging/hfi1: replace indent spaces with tabs
Running checkpatch.pl on mad.c produces several
"ERROR: code indent should use tabs where possible" messages.
This patch fixes these.
Signed-off-by: Jeff Becker <Jeffrey.C.Becker@nasa.gov>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Mike Marciniszyn [Thu, 30 Jul 2015 19:17:43 +0000 (15:17 -0400)]
IB/hfi1: add driver files
Signed-off-by: Andrew Friedley <andrew.friedley@intel.com>
Signed-off-by: Arthur Kepner <arthur.kepner@intel.com>
Signed-off-by: Brendan Cunningham <brendan.cunningham@intel.com>
Signed-off-by: Brian Welty <brian.welty@intel.com>
Signed-off-by: Caz Yokoyama <caz.yokoyama@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Easwar Hariharan <easwar.hariharan@intel.com>
Signed-off-by: Harish Chegondi <harish.chegondi@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Jim Snow <jim.m.snow@intel.com>
Signed-off-by: John Gregor <john.a.gregor@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Kevin Pine <kevin.pine@intel.com>
Signed-off-by: Kyle Liddell <kyle.liddell@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Ravi Krishnaswamy <ravi.krishnaswamy@intel.com>
Signed-off-by: Sadanand Warrier <sadanand.warrier@intel.com>
Signed-off-by: Sanath Kumar <sanath.s.kumar@intel.com>
Signed-off-by: Sudeep Dutt <sudeep.dutt@intel.com>
Signed-off-by: Vlad Danushevsky <vladimir.danusevsky@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Dennis Dalessandro [Thu, 30 Jul 2015 19:17:32 +0000 (15:17 -0400)]
IB/core: Add core header changes needed for OPA
This patch adds the value of the CNP opcode to the existing list of enumerated
opcodes in ib_pack.h
Add common OPA header definitions for driver
build:
- opa_port_info.h
- opa_smi.h
- hfi1_user.h
Additionally, ib_mad.h, has additional definitions
that are common to ib_drivers including:
- trap support
- cca support
The qib driver has the duplication removed in favor
those in ib_mad.h
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: John, Jubin <jubin.john@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Steve Wise [Wed, 29 Jul 2015 14:44:14 +0000 (09:44 -0500)]
RDMA/amso1100: Deprecate the amso1100 driver and move to staging
The HW hasn't been sold since 2005, and the SW has definite bit rot.
Its time to remove it. So move it to staging for a few releases and
then remove it after that.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Dennis Dalessandro [Thu, 30 Jul 2015 13:25:42 +0000 (09:25 -0400)]
IB/ipath: Deprecate ipath driver and move to staging.
It is now time for the ipath driver to begin to be phased out of the kernel.
This patch moves the ipath driver from the Infiniband sub tree to the staging
area where it will remain until the code is removed from the kernel in a few
releases.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Doug Ledford [Thu, 27 Aug 2015 18:18:57 +0000 (14:18 -0400)]
Staging: Add staging/rdma directory and update MAINTAINERS
Create the rdma directory in the staging area for use as we deprecate
some older drivers and as we bring in some new drivers that are in
need of work. Update the MAINTAINERS file so that updates to these
files go to linux-rdma@vger.kernel.org. Expected lifespan of this
directory is three releases for any deprecated drivers moved here
and an unknown, but theoretically bounded amount of time for the new
drivers as a new core RDMA transfer library needs to be written and
the drivers modified to use it in order for them to move out of this
directory.
Signed-off-by: Doug Ledford <dledford@redhat.com>
Hariprasad S [Tue, 25 Aug 2015 08:38:23 +0000 (14:08 +0530)]
iw_cxgb4: Add support for clip
Add support for ipv6 address handling clip api provided by lld
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Spencer Baugh [Thu, 13 Aug 2015 19:19:10 +0000 (12:19 -0700)]
RDMA/cma: fix IPv6 address resolution
Resolving a link-local IPv6 address with an unspecified source address
was broken by commit
5462eddd7a, which prevented the IPv6 stack from
learning the scope id of the link-local IPv6 address, causing random
failures as the IP stack chose a random link to resolve the address on.
This commit
5462eddd7a made us bail out of cma_check_linklocal early if
the address passed in was not an IPv6 link-local address. On the address
resolution path, the address passed in is the source address; if the
source address is the unspecified address, which is not link-local, we
will bail out early.
This is mostly correct, but if the destination address is a link-local
address, then we will be following a link-local route, and we'll need to
tell the IPv6 stack what the scope id of the destination address is.
This used to be done by last line of cma_check_linklocal, which is
skipped when bailing out early:
dev_addr->bound_dev_if = sin6->sin6_scope_id;
(In cma_bind_addr, the sin6_scope_id of the source address is set to the
sin6_scope_id of the destination address, so this is correct)
This line is required in turn for the following line, L279 of
addr6_resolve, to actually inform the IPv6 stack of the scope id:
fl6.flowi6_oif = addr->bound_dev_if;
Since we can only know we are in this failure case when we have access
to both the source IPv6 address and destination IPv6 address, we have to
deal with this further up the stack. So detect this failure case in
cma_bind_addr, and set bound_dev_if to the destination address scope id
to correct it.
Signed-off-by: Spencer Baugh <sbaugh@catern.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Jason Gunthorpe [Tue, 4 Aug 2015 23:13:32 +0000 (17:13 -0600)]
IB/ucma: Fix theoretical user triggered use-after-free
Something like this:
CPU A CPU B
Acked-by: Sean Hefty <sean.hefty@intel.com>
======================== ================================
ucma_destroy_id()
wait_for_completion()
.. anything
ucma_put_ctx()
complete()
.. continues ...
ucma_leave_multicast()
mutex_lock(mut)
atomic_inc(ctx->ref)
mutex_unlock(mut)
ucma_free_ctx()
ucma_cleanup_multicast()
mutex_lock(mut)
kfree(mc)
rdma_leave_multicast(mc->ctx->cm_id,..
Fix it by latching the ref at 0. Once it goes to 0 mc and ctx cannot
leave the mutex(mut) protection.
The other atomic_inc in ucma_get_ctx is OK because mutex(mut) protects
it from racing with ucma_destroy_id.
Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Hariprasad S [Mon, 27 Jul 2015 08:38:52 +0000 (14:08 +0530)]
iw_cxgb4: set the default MPA version to 2
This enables ORD/IRD negotiation and its about time to enable it by
default
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Steve Wise [Tue, 28 Jul 2015 14:13:52 +0000 (09:13 -0500)]
RDMA/iser: Limit sgs to the device fastreg depth
Currently the sg tablesize, which dictates fast register page list
depth to use, does not take into account the limits of the rdma device.
So adjust it once we discover the device fastreg max depth limit. Also
adjust the max_sectors based on the resulting sg tablesize.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Roland Dreier [Mon, 27 Jul 2015 21:43:23 +0000 (14:43 -0700)]
IB/mlx5: Remove dead code from alloc_cached_mr()
The only place that assigns mr inside the loop already does a break.
So "if (mr)" will never be true here since the function initializes mr
to NULL at the top. We can just drop the extra if and break here.
Signed-off-by: Roland Dreier <roland@purestorage.com>
Acked-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Mike Marciniszyn [Tue, 21 Jul 2015 12:36:07 +0000 (08:36 -0400)]
IB/qib: Change lkey table allocation to support more MRs
The lkey table is allocated with with a get_user_pages() with an
order based on a number of index bits from a module parameter.
The underlying kernel code cannot allocate that many contiguous pages.
There is no reason the underlying memory needs to be physically
contiguous.
This patch:
- switches the allocation/deallocation to vmalloc/vfree
- caps the number of bits to 23 to insure at least 1 generation bit
o this matches the module parameter description
Cc: stable@vger.kernel.org
Reviewed-by: Vinit Agnihotri <vinit.abhay.agnihotri@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Sagi Grimberg [Tue, 21 Jul 2015 11:40:12 +0000 (14:40 +0300)]
mlx5: Expose correct page_size_cap in device attributes
Should be all the page sizes that are supported by the
device.
Reported-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Sagi Grimberg [Mon, 20 Jul 2015 16:54:36 +0000 (19:54 +0300)]
mlx5: Fix missing device local_dma_lkey
The mlx5 driver exposes device capability IB_DEVICE_LOCAL_DMA_LKEY
but does not set the the device local_dma_lkey. This breaks
rpcrdma drivers.
Query and set this lkey when creating the device resources.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Linus Torvalds [Mon, 24 Aug 2015 03:52:59 +0000 (20:52 -0700)]
Linux 4.2-rc8
Linus Torvalds [Mon, 24 Aug 2015 03:46:22 +0000 (20:46 -0700)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"A couple of major (hang and deadlock) fixes with fortunately fairly
rare triggering conditions. The PM oops is only really triggered by
people using enclosure services (rare) and the fnic driver is mostly
used in enterprise environments"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
SCSI: Fix NULL pointer dereference in runtime PM
fnic: Use the local variable instead of I/O flag to acquire io_req_lock in fnic_queuecommand() to avoid deadloack
Linus Torvalds [Sun, 23 Aug 2015 14:23:09 +0000 (07:23 -0700)]
Merge branch 'upstream' of git://git.linux-mips.org/ralf/upstream-linus
Pull MIPS bug fixes from Ralf Baechle:
"Two more fixes for 4.2.
One fixes a build issue with the LLVM assembler - LLVM assembler macro
names are case sensitive, GNU as macro names are insensitive; the
other corrects a license string (GPL v2, not GPLv2) such that the
module loader will recognice the license correctly"
* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
FIRMWARE: bcm47xx_nvram: Fix module license.
MIPS: Fix LLVM build issue.
Linus Torvalds [Sun, 23 Aug 2015 03:22:11 +0000 (20:22 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/viro/vfs
Pull 9p regression fix from Al Viro:
"Fix for breakage introduced when switching p9_client_{read,write}() to
struct iov_iter * (went into 4.1)"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
9p: ensure err is initialized to 0 in p9_client_read/write
Vincent Bernat [Sat, 15 Aug 2015 13:49:13 +0000 (15:49 +0200)]
9p: ensure err is initialized to 0 in p9_client_read/write
Some use of those functions were providing unitialized values to those
functions. Notably, when reading 0 bytes from an empty file on a 9P
filesystem, the return code of read() was not 0.
Tested with this simple program:
#include <assert.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
int main(int argc, const char **argv)
{
assert(argc == 2);
char buffer[256];
int fd = open(argv[1], O_RDONLY|O_NOCTTY);
assert(fd >= 0);
assert(read(fd, buffer, 0) == 0);
return 0;
}
Cc: stable@vger.kernel.org # v4.1
Signed-off-by: Vincent Bernat <vincent@bernat.im>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Linus Torvalds [Sat, 22 Aug 2015 22:48:04 +0000 (15:48 -0700)]
Merge branch 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm
Pull ARM fixes from Russell King:
"Another couple of small ARM fixes.
A patch from Masahiro Yamada who noticed that "make -jN all zImage"
would end up generating bad images where N > 1, and a patch from
Nicolas to fix the Marvell CPU user access optimisation code when page
faults are disabled"
* 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm:
ARM: 8418/1: add boot image dependencies to not generate invalid images
ARM: 8414/1: __copy_to_user_memcpy: fix mmap semaphore usage
Linus Torvalds [Sat, 22 Aug 2015 15:15:36 +0000 (08:15 -0700)]
Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
"Various low level fixes: fix more fallout from the FPU rework and the
asm entry code rework, plus an MSI rework fix, and an idle-tracing fix"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/fpu/math-emu: Fix crash in fork()
x86/fpu/math-emu: Fix math-emu boot crash
x86/idle: Restore trace_cpu_idle to mwait_idle() calls
x86/irq: Build correct vector mapping for multiple MSI interrupts
Revert "sched/x86_64: Don't save flags on context switch"
Linus Torvalds [Sat, 22 Aug 2015 15:06:28 +0000 (08:06 -0700)]
Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
"Tooling fixes: a 'perf record' deadlock fix plus debuggability fixes"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf top: Show backtrace when handling a SIGSEGV on --stdio mode
perf tools: Fix buildid processing
perf tools: Make fork event processing more resilient
perf tools: Avoid deadlock when map_groups are broken
Linus Torvalds [Sat, 22 Aug 2015 14:45:36 +0000 (07:45 -0700)]
Merge branch 'irq-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull irq fixes from Thomas Gleixner:
"A series of small fixlets for a regression visible on OMAP devices
caused by the conversion of the OMAP interrupt chips to hierarchical
interrupt domains. Mostly one liners on the driver side plus a small
helper function in the core to avoid open coded mess in the drivers"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/crossbar: Restore set_wake functionality
irqchip/crossbar: Restore the mask on suspend behaviour
ARM: OMAP: wakeupgen: Restore the irq_set_type() mechanism
irqchip/crossbar: Restore the irq_set_type() mechanism
genirq: Introduce irq_chip_set_type_parent() helper
genirq: Don't return ENOSYS in irq_chip_retrigger_hierarchy
Linus Torvalds [Sat, 22 Aug 2015 14:37:41 +0000 (07:37 -0700)]
Merge branch 'timers-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull timer fixes from Thomas Gleixner:
"Two minimalistic fixes for 4.2 regressions:
- Eric fixed a thinko in the timer_list base switching code caused by
the overhaul of the timer wheel. It can cause a cpu to see the
wrong base for a timer while we move the timer around.
- Guenter fixed a regression for IMX if booted w/o device tree, where
the timer interrupt is not initialized and therefor the machine
fails to boot"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
clocksource/imx: Fix boot with non-DT systems
timer: Write timer->flags atomically
Ingo Molnar [Wed, 27 May 2015 10:22:29 +0000 (12:22 +0200)]
x86/fpu/math-emu: Fix crash in fork()
During later stages of math-emu bootup the following crash triggers:
math_emulate: 0060:
c100d0a8
Kernel panic - not syncing: Math emulation needed in kernel
CPU: 0 PID: 1511 Comm: login Not tainted 4.2.0-rc7+ #1012
[...]
Call Trace:
[<
c181d50d>] dump_stack+0x41/0x52
[<
c181c918>] panic+0x77/0x189
[<
c1003530>] ? math_error+0x140/0x140
[<
c164c2d7>] math_emulate+0xba7/0xbd0
[<
c100d0a8>] ? fpu__copy+0x138/0x1c0
[<
c1109c3c>] ? __alloc_pages_nodemask+0x12c/0x870
[<
c136ac20>] ? proc_clear_tty+0x40/0x70
[<
c136ac6e>] ? session_clear_tty+0x1e/0x30
[<
c1003530>] ? math_error+0x140/0x140
[<
c1003575>] do_device_not_available+0x45/0x70
[<
c100d0a8>] ? fpu__copy+0x138/0x1c0
[<
c18258e6>] error_code+0x5a/0x60
[<
c1003530>] ? math_error+0x140/0x140
[<
c100d0a8>] ? fpu__copy+0x138/0x1c0
[<
c100c205>] arch_dup_task_struct+0x25/0x30
[<
c1048cea>] copy_process.part.51+0xea/0x1480
[<
c115a8e5>] ? dput+0x175/0x200
[<
c136af70>] ? no_tty+0x30/0x30
[<
c1157242>] ? do_vfs_ioctl+0x322/0x540
[<
c104a21a>] _do_fork+0xca/0x340
[<
c1057b06>] ? SyS_rt_sigaction+0x66/0x90
[<
c104a557>] SyS_clone+0x27/0x30
[<
c1824a80>] sysenter_do_call+0x12/0x12
The reason is the incorrect assumption in fpu_copy(), that FNSAVE
can be executed from math-emu kernels as well.
Don't try to copy the registers, the soft state will be copied
by fork anyway, so the child task inherits the parent task's
soft math state.
With this fix applied math-emu kernels boot up fine on modern
hardware and the 'no387 nofxsr' boot options.
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Ingo Molnar [Sat, 22 Aug 2015 07:52:06 +0000 (09:52 +0200)]
x86/fpu/math-emu: Fix math-emu boot crash
On a math-emu bootup the following crash occurs:
Initializing CPU#0
------------[ cut here ]------------
kernel BUG at arch/x86/kernel/traps.c:779!
invalid opcode: 0000 [#1] SMP
[...]
EIP is at do_device_not_available+0xe/0x70
[...]
Call Trace:
[<
c18238e6>] error_code+0x5a/0x60
[<
c1002bd0>] ? math_error+0x140/0x140
[<
c100bbd9>] ? fpu__init_cpu+0x59/0xa0
[<
c1012322>] cpu_init+0x202/0x330
[<
c104509f>] ? __native_set_fixmap+0x1f/0x30
[<
c1b56ab0>] trap_init+0x305/0x346
[<
c1b548af>] start_kernel+0x1a5/0x35d
[<
c1b542b4>] i386_start_kernel+0x82/0x86
The reason is that in the following commit:
b1276c48e91b ("x86/fpu: Initialize fpregs in fpu__init_cpu_generic()")
I failed to consider math-emu's limitation that it cannot execute the
FNINIT instruction in kernel mode.
The long term fix might be to allow math-emu to execute (certain) kernel
mode FPU instructions, but for now apply the safe (albeit somewhat ugly)
fix: initialize the emulation state explicitly without trapping out to
the FPU emulator.
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Christoph Hellwig [Fri, 21 Aug 2015 21:11:54 +0000 (14:11 -0700)]
Add hch to .get_maintainer.ignore
While the idea behind get_maintainer seems highly useful it's
unfortunately way to trigger happy to grab people that once had a few
commits to files. For someone like me who does a lot of tree-wide API
work that leads to an incredible amount of Cc spam.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michal Hocko [Fri, 21 Aug 2015 21:11:51 +0000 (14:11 -0700)]
mm: make page pfmemalloc check more robust
Commit
c48a11c7ad26 ("netvm: propagate page->pfmemalloc to skb") added
checks for page->pfmemalloc to __skb_fill_page_desc():
if (page->pfmemalloc && !page->mapping)
skb->pfmemalloc = true;
It assumes page->mapping == NULL implies that page->pfmemalloc can be
trusted. However, __delete_from_page_cache() can set set page->mapping
to NULL and leave page->index value alone. Due to being in union, a
non-zero page->index will be interpreted as true page->pfmemalloc.
So the assumption is invalid if the networking code can see such a page.
And it seems it can. We have encountered this with a NFS over loopback
setup when such a page is attached to a new skbuf. There is no copying
going on in this case so the page confuses __skb_fill_page_desc which
interprets the index as pfmemalloc flag and the network stack drops
packets that have been allocated using the reserves unless they are to
be queued on sockets handling the swapping which is the case here and
that leads to hangs when the nfs client waits for a response from the
server which has been dropped and thus never arrive.
The struct page is already heavily packed so rather than finding another
hole to put it in, let's do a trick instead. We can reuse the index
again but define it to an impossible value (-1UL). This is the page
index so it should never see the value that large. Replace all direct
users of page->pfmemalloc by page_is_pfmemalloc which will hide this
nastiness from unspoiled eyes.
The information will get lost if somebody wants to use page->index
obviously but that was the case before and the original code expected
that the information should be persisted somewhere else if that is
really needed (e.g. what SLAB and SLUB do).
[akpm@linux-foundation.org: fix blooper in slub]
Fixes:
c48a11c7ad26 ("netvm: propagate page->pfmemalloc to skb")
Signed-off-by: Michal Hocko <mhocko@suse.com>
Debugged-by: Vlastimil Babka <vbabka@suse.com>
Debugged-by: Jiri Bohac <jbohac@suse.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David Miller <davem@davemloft.net>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: <stable@vger.kernel.org> [3.6+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 21 Aug 2015 18:18:10 +0000 (11:18 -0700)]
Merge tag 'pci-v4.2-fixes-2' of git://git./linux/kernel/git/helgaas/pci
Pull PCI fixes from Bjorn Helgaas:
"These are fixes for ASPM-related NULL pointer dereference crashes on
Sparc and PowerPC and 64-bit PCI address-related HPMC crashes on
PA-RISC. These are both caused by things we merged in the v4.2 merge
window. Details:
Resource management
- Don't use 64-bit bus addresses on PA-RISC
Miscellaneous
- Tolerate hierarchies with no Root Port"
* tag 'pci-v4.2-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
PCI: Don't use 64-bit bus addresses on PA-RISC
PCI: Tolerate hierarchies with no Root Port
Linus Torvalds [Fri, 21 Aug 2015 18:03:06 +0000 (11:03 -0700)]
Merge tag 'media/v4.2-3' of git://git./linux/kernel/git/mchehab/linux-media
Pull media fixes from Mauro Carvalho Chehab:
- a regression fix at the videobuf2 core driver
- fix error handling at mantis probing code
- revert the IR encode patches, as the API is not mature enough.
So, better to postpone the changes to a latter Kernel
- fix Kconfig breakages on some randconfig scenarios.
* tag 'media/v4.2-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
[media] mantis: Fix error handling in mantis_dma_init()
Revert "[media] rc: rc-ir-raw: Add scancode encoder callback"
Revert "[media] rc: rc-ir-raw: Add Manchester encoder (phase encoder) helper"
Revert "[media] rc: ir-rc5-decoder: Add encode capability"
Revert "[media] rc: ir-rc6-decoder: Add encode capability"
Revert "[media] rc: rc-core: Add support for encode_wakeup drivers"
Revert "[media] rc: rc-loopback: Add loopback of filter scancodes"
Revert "[media] rc: nuvoton-cir: Add support for writing wakeup samples via sysfs filter callback"
[media] vb2: Fix compilation breakage when !CONFIG_BUG
[media] vb2: Only requeue buffers immediately once streaming is started
[media] media/pci/cobalt: fix Kconfig and build when SND is not enabled
[media] media/dvb: fix ts2020.c Kconfig and build