GitHub/LineageOS/android_kernel_samsung_universal7580.git
14 years agoRDS: lock rds_conn_count decrement in rds_conn_destroy()
Zach Brown [Fri, 23 Jul 2010 17:30:45 +0000 (10:30 -0700)]
RDS: lock rds_conn_count decrement in rds_conn_destroy()

rds_conn_destroy() can race with all other modifications of the
rds_conn_count but it was modifying the count without locking.

Signed-off-by: Zach Brown <zach.brown@oracle.com>
14 years agoRDS/IB: protect the list of IB devices
Zach Brown [Thu, 15 Jul 2010 19:34:33 +0000 (12:34 -0700)]
RDS/IB: protect the list of IB devices

The RDS IB device list wasn't protected by any locking.  Traversal in
both the get_mr and FMR flushing paths could race with additon and
removal.

List manipulation is done with RCU primatives and is protected by the
write side of a rwsem.  The list traversal in the get_mr fast path is
protected by a rcu read critical section.  The FMR list traversal is
more problematic because it can block while traversing the list.  We
protect this with the read side of the rwsem.

Signed-off-by: Zach Brown <zach.brown@oracle.com>
14 years agoRDS/IB: print IB event strings as well as their number
Zach Brown [Wed, 14 Jul 2010 21:01:21 +0000 (14:01 -0700)]
RDS/IB: print IB event strings as well as their number

It's nice to not have to go digging in the code to see which event
occurred.  It's easy to throw together a quick array that maps the ib
event enums to their strings.  I didn't see anything in the stack that
does this translation for us, but I also didn't look very hard.

Signed-off-by: Zach Brown <zach.brown@oracle.com>
14 years agoRDS: flush fmrs before allocating new ones
Chris Mason [Tue, 20 Jul 2010 00:06:46 +0000 (17:06 -0700)]
RDS: flush fmrs before allocating new ones

Flushing FMRs is somewhat expensive, and is currently kicked off when
the interrupt handler notices that we are getting low.  The result of
this is that FMR flushing only happens from the interrupt cpus.

This spreads the load more effectively by triggering flushes just before
we allocate a new FMR.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
14 years agoRDS: properly use sg_init_table
Chris Mason [Tue, 20 Jul 2010 00:02:41 +0000 (17:02 -0700)]
RDS: properly use sg_init_table

This is only needed to keep debugging code from bugging.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
14 years agoRDS/IB: track signaled sends
Zach Brown [Wed, 14 Jul 2010 20:55:35 +0000 (13:55 -0700)]
RDS/IB: track signaled sends

We're seeing bugs today where IB connection shutdown clears the send
ring while the tasklet is processing completed sends.  Implementation
details cause this to dereference a null pointer.  Shutdown needs to
wait for send completion to stop before tearing down the connection.  We
can't simply wait for the ring to empty because it may contain
unsignaled sends that will never be processed.

This patch tracks the number of signaled sends that we've posted and
waits for them to complete.  It also makes sure that the tasklet has
finished executing.

Signed-off-by: Zach Brown <zach.brown@oracle.com>
14 years agoRDS: remove __init and __exit annotation
Zach Brown [Fri, 9 Jul 2010 19:26:20 +0000 (12:26 -0700)]
RDS: remove __init and __exit annotation

The trivial amount of memory saved isn't worth the cost of dealing with section
mismatches.

Signed-off-by: Zach Brown <zach.brown@oracle.com>
14 years agoRDS/IB: Use SLAB_HWCACHE_ALIGN flag for kmem_cache_create()
Andy Grover [Wed, 7 Jul 2010 23:46:26 +0000 (16:46 -0700)]
RDS/IB: Use SLAB_HWCACHE_ALIGN flag for kmem_cache_create()

We are *definitely* counting cycles as closely as DaveM, so
ensure hwcache alignment for our recv ring control structs.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
14 years agoRDS/IB: always process recv completions
Zach Brown [Tue, 6 Jul 2010 22:04:34 +0000 (15:04 -0700)]
RDS/IB: always process recv completions

The recv refill path was leaking fragments because the recv event handler had
marked a ring element as free without freeing its frag.  This was happening
because it wasn't processing receives when the conn wasn't marked up or
connecting, as can be the case if it races with rmmod.

Two observations support always processing receives in the callback.

First, buildup should only post receives, thus triggering recv event handler
calls, once it has built up all the state to handle them.  Teardown should
destroy the CQ and drain the ring before tearing down the state needed to
process recvs.  Both appear to be true today.

Second, this test was fundamentally racy.  There is nothing to stop rmmod and
connection destruction from swooping in the moment after the conn state was
sampled but before real receive procesing starts.

Signed-off-by: Zach Brown <zach.brown@oracle.com>
14 years agoRDS: return to a single-threaded krdsd
Zach Brown [Tue, 6 Jul 2010 22:08:48 +0000 (15:08 -0700)]
RDS: return to a single-threaded krdsd

We were seeing very nasty bugs due to fundamental assumption the current code
makes about concurrent work struct processing.  The code simpy isn't able to
handle concurrent connection shutdown work function execution today, for
example, which is very much possible once a multi-threaded krdsd was
introduced.  The problem compounds as additional work structs are added to the
mix.

krdsd is no longer perforance critical now that send and receive posting and
FMR flushing are done elsewhere, so the safest fix is to move back to the
single threaded krdsd that the current code was built around.

Signed-off-by: Zach Brown <zach.brown@oracle.com>
14 years agoRDS/IB: create a work queue for FMR flushing
Zach Brown [Tue, 6 Jul 2010 22:09:56 +0000 (15:09 -0700)]
RDS/IB: create a work queue for FMR flushing

This patch moves the FMR flushing work in to its own mult-threaded work queue.
This is to maintain performance in preparation for returning the main krdsd
work queue back to a single threaded work queue to avoid deep-rooted
concurrency bugs.

This is also good because it further separates FMRs, which might be removed
some day, from the rest of the code base.

Signed-off-by: Zach Brown <zach.brown@oracle.com>
14 years agoRDS/IB: destroy connections on rmmod
Zach Brown [Fri, 25 Jun 2010 21:58:16 +0000 (14:58 -0700)]
RDS/IB: destroy connections on rmmod

IB connections were not being destroyed during rmmod.

First, recently IB device removal callback was changed to disconnect
connections that used the removing device rather than destroying them.  So
connections with devices during rmmod were not being destroyed.

Second, rds_ib_destroy_nodev_conns() was being called before connections are
disassociated with devices.  It would almost never find connections in the
nodev list.

We first get rid of rds_ib_destroy_conns(), which is no longer called, and
refactor the existing caller into the main body of the function and get rid of
the list and lock wrappers.

Then we call rds_ib_destroy_nodev_conns() *after* ib_unregister_client() has
removed the IB device from all the conns and put the conns on the nodev list.

The result is that IB connections are destroyed by rmmod.

Signed-off-by: Zach Brown <zach.brown@oracle.com>
14 years agoRDS/IB: wait for IB dev freeing work to finish during rmmod
Zach Brown [Fri, 25 Jun 2010 21:59:49 +0000 (14:59 -0700)]
RDS/IB: wait for IB dev freeing work to finish during rmmod

The RDS IB client removal callback can queue work to drop the final reference
to an IB device.  We have to make sure that this function has returned before
we complete rmmod or the work threads can try to execute freed code.

Signed-off-by: Zach Brown <zach.brown@oracle.com>
14 years agoRDS/IB: Make ib_recv_refill return void
Andy Grover [Thu, 24 Jun 2010 01:06:30 +0000 (18:06 -0700)]
RDS/IB: Make ib_recv_refill return void

Signed-off-by: Andy Grover <andy.grover@oracle.com>
14 years agoRDS: Remove unused XLIST_PTR_TAIL and xlist_protect()
Andy Grover [Fri, 11 Jun 2010 23:24:42 +0000 (16:24 -0700)]
RDS: Remove unused XLIST_PTR_TAIL and xlist_protect()

Not used.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
14 years agoRDS: whitespace
Andy Grover [Fri, 11 Jun 2010 22:18:51 +0000 (15:18 -0700)]
RDS: whitespace

14 years agoRDS: use delayed work for the FMR flushes
Chris Mason [Fri, 11 Jun 2010 18:26:02 +0000 (11:26 -0700)]
RDS: use delayed work for the FMR flushes

Using a delayed work queue helps us make sure a healthy number of FMRs
have queued up over the limit.  It makes for a large improvement in RDMA
iops.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
14 years agords: more FMRs are faster
Chris Mason [Fri, 11 Jun 2010 18:18:57 +0000 (11:18 -0700)]
rds: more FMRs are faster

When we add more FMRs, we flush them less often and so we go faster.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
14 years agords: recycle FMRs through lockless lists
Chris Mason [Fri, 11 Jun 2010 18:17:59 +0000 (11:17 -0700)]
rds: recycle FMRs through lockless lists

FRM allocation and recycling is performance critical and fairly lock
intensive.  The current code has a per connection lock that all
processes bang on and it becomes a major bottleneck on large systems.

This changes things to use a number of cmpxchg based lists instead,
allowing us to go through the whole FMR lifecycle without locking inside
RDS.

Zach Brown pointed out that our usage of cmpxchg for xlist removal is
racey if someone manages to remove and add back an FMR struct into the list
while another CPU can see the FMR's address at the head of the list.

The second CPU might assume the list hasn't changed when in fact any
number of operations might have happened in between the deletion and
reinsertion.

This commit maintains a per cpu count of CPUs that are currently
in xlist removal, and establishes a grace period to make sure that
nobody can see an entry we have just removed from the list.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
14 years agords: fix rds_send_xmit() serialization
Zach Brown [Fri, 4 Jun 2010 21:41:41 +0000 (14:41 -0700)]
rds: fix rds_send_xmit() serialization

rds_send_xmit() was changed to hold an interrupt masking spinlock instead of a
mutex so that it could be called from the IB receive tasklet path.  This broke
the TCP transport because its xmit method can block and masks and unmasks
interrupts.

This patch serializes callers to rds_send_xmit() with a simple bit instead of
the current spinlock or previous mutex.  This enables rds_send_xmit() to be
called from any context and to call functions which block.  Getting rid of the
c_send_lock exposes the bare c_lock acquisitions which are changed to block
interrupts.

A waitqueue is added so that rds_conn_shutdown() can wait for callers to leave
rds_send_xmit() before tearing down partial send state.  This lets us get rid
of c_senders.

rds_send_xmit() is changed to check the conn state after acquiring the
RDS_IN_XMIT bit to resolve races with the shutdown path.  Previously both
worked with the conn state and then the lock in the same order, allowing them
to race and execute the paths concurrently.

rds_send_reset() isn't racing with rds_send_xmit() now that rds_conn_shutdown()
properly ensures that rds_send_xmit() can't start once the conn state has been
changed.  We can remove its previous use of the spinlock.

Finally, c_send_generation is redundant.  Callers can race to test the c_flags
bit by simply retrying instead of racing to test the c_send_generation atomic.

Signed-off-by: Zach Brown <zach.brown@oracle.com>
14 years agords: block ints when acquiring c_lock in rds_conn_message_info()
Zach Brown [Fri, 4 Jun 2010 21:25:27 +0000 (14:25 -0700)]
rds: block ints when acquiring c_lock in rds_conn_message_info()

conn->c_lock is acquired in interrupt context.  rds_conn_message_info() is
called from user context and was acquiring c_lock without blocking interrupts,
leading to possible deadlocks.

Signed-off-by: Zach Brown <zach.brown@oracle.com>
14 years agords: remove unused rds_send_acked_before()
Zach Brown [Fri, 4 Jun 2010 21:26:32 +0000 (14:26 -0700)]
rds: remove unused rds_send_acked_before()

rds_send_acked_before() wasn't blocking interrupts when acquiring c_lock from
user context but nothing calls it.  Rather than fix its use of c_lock we just
remove the function.

Signed-off-by: Zach Brown <zach.brown@oracle.com>
14 years agoRDS: use friendly gfp masks for prefill
Chris Mason [Thu, 27 May 2010 04:45:06 +0000 (21:45 -0700)]
RDS: use friendly gfp masks for prefill

When prefilling the rds frags, we end up doing a lot of allocations.
We're not in atomic context here, and so there's no reason to dip into
atomic reserves.  This changes the prefills to use masks that allow
waiting.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
14 years agoRDS/IB: Add caching of frags and incs
Chris Mason [Thu, 27 May 2010 05:05:37 +0000 (22:05 -0700)]
RDS/IB: Add caching of frags and incs

This patch is based heavily on an initial patch by Chris Mason.
Instead of freeing slab memory and pages, it keeps them, and
funnels them back to be reused.

The lock minimization strategy uses xchg and cmpxchg atomic ops
for manipulation of pointers to list heads. We anchor the lists with a
pointer to a list_head struct instead of a static list_head struct.
We just have to carefully use the existing primitives with
the difference between a pointer and a static head struct.

For example, 'list_empty()' means that our anchor pointer points to a list with
a single item instead of meaning that our static head element doesn't point to
any list items.

Original patch by Chris, with significant mods and fixes by Andy and Zach.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: Zach Brown <zach.brown@oracle.com>
14 years agoRDS/IB: Remove ib_recv_unmap_page()
Andy Grover [Tue, 25 May 2010 18:20:09 +0000 (11:20 -0700)]
RDS/IB: Remove ib_recv_unmap_page()

All it does is call unmap_sg(), so just call that directly.

The comment above unmap_page also may be incorrect, so we
shouldn't hold on to it, either.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
14 years agoRDS: Assume recv->r_frag is always NULL in refill_one()
Andy Grover [Tue, 25 May 2010 03:28:49 +0000 (20:28 -0700)]
RDS: Assume recv->r_frag is always NULL in refill_one()

refill_one() should never be called on a recv struct that
doesn't need a new r_frag allocated. Add a WARN and remove
conditional around r_frag alloc code.

Also, add a comment to explain why r_ibinc may or may not
need refilling.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
14 years agoRDS: Use page_remainder_alloc() for recv bufs
Andy Grover [Tue, 25 May 2010 03:12:41 +0000 (20:12 -0700)]
RDS: Use page_remainder_alloc() for recv bufs

Instead of splitting up a page into RDS_FRAG_SIZE chunks
ourselves, ask rds_page_remainder_alloc() to do it. While it
is possible PAGE_SIZE > FRAG_SIZE, on x86en it isn't, so having
duplicate "carve up a page into buffers" code seems excessive.

The other modification this spawns is the use of a single
struct scatterlist in rds_page_frag instead of a bare page ptr.
This causes verbosity to increase in some places, and decrease
in others.

Finally, I decided to unify the lifetimes and alloc/free of
rds_page_frag and its page. This is a nice simplification in itself,
but will be extra-nice once we come to adding cmason's recycling
patch.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
14 years agoRDS/IB: disconnect when IB devices are removed
Zach Brown [Mon, 24 May 2010 20:16:57 +0000 (13:16 -0700)]
RDS/IB: disconnect when IB devices are removed

Currently IB device removal destroys connections which are associated with the
device.  This prevents connections from being re-established when replacement
devices are added.

Instead we'll queue shutdown work on the connections as their devices are
removed.  When we see that devices are added we triger connection attempts on
all connections that don't currently have a device.

The result is that RDS sockets can resume device-independent work (bcopy, not
RDMA) across IB device removal and restoration.

Signed-off-by: Zach Brown <zach.brown@oracle.com>
14 years agoRDS: introduce rds_conn_connect_if_down()
Zach Brown [Mon, 24 May 2010 20:14:36 +0000 (13:14 -0700)]
RDS: introduce rds_conn_connect_if_down()

A few paths had the same block of code to queue a connection's connect work if
it was in the right state.  Let's move this in to a helper function.

Signed-off-by: Zach Brown <zach.brown@oracle.com>
14 years agoRDS/IB: add refcount tracking to struct rds_ib_device
Zach Brown [Tue, 18 May 2010 22:48:51 +0000 (15:48 -0700)]
RDS/IB: add refcount tracking to struct rds_ib_device

The RDS IB client .remove callback used to free the rds_ibdev for the given
device unconditionally.  This could race other users of the struct.  This patch
adds refcounting so that we only free the rds_ibdev once all of its users are
done.

Many rds_ibdev users are tied to connections.  We give the connection a
reference and change these users to reference the device in the connection
instead of looking it up in the IB client data.  The only user of the IB client
data remaining is the first lookup of the device as connections are built up.

Incrementing the reference count of a device found in the IB client data could
race with final freeing so we use an RCU grace period to make sure that freeing
won't happen until those lookups are done.

MRs need the rds_ibdev to get at the pool that they're freed in to.  They exist
outside a connection and many MRs can reference different devices from one
socket, so it was natural to have each MR hold a reference.  MR refs can be
dropped from interrupt handlers and final device teardown can block so we push
it off to a work struct.  Pool teardown had to be fixed to cancel its pending
work instead of deadlocking waiting for all queued work, including itself, to
finish.

MRs get their reference from the global device list, which gets a reference.
It is left unprotected by locks and remains racy.  A simple global lock would
be a significant bottleneck.  More scalable (complicated) locking should be
done carefully in a later patch.

Signed-off-by: Zach Brown <zach.brown@oracle.com>
14 years agoRDS/IB: get the xmit max_sge from the RDS IB device on the connection
Zach Brown [Tue, 18 May 2010 22:44:50 +0000 (15:44 -0700)]
RDS/IB: get the xmit max_sge from the RDS IB device on the connection

rds_ib_xmit_rdma() was calling ib_get_client_data() to get at the rds_ibdevice
just to get the max_sge for the transmit.  This patch instead has it get it
directly off the rds_ibdev which is stored on the connection.

The current code won't free the rds_ibdev until all the IB connections that use
it are freed.  So it's safe to reference the rds_ibdev this way.  In the future
it also makes it easier to support proper reference counting of the rds_ibdev
struct.

As an additional bonus, this gets rid of the performance hit of calling in to
the IB stack to look up the rds_ibdev.  The current implementation in the IB
stack acquires an interrupt blocking spinlock to protect the registration of
client callback data.

Signed-off-by: Zach Brown <zach.brown@oracle.com>
14 years agoRDS/IB: rds_ib_cm_handle_connect() forgot to unlock c_cm_lock
Zach Brown [Mon, 24 May 2010 20:14:59 +0000 (13:14 -0700)]
RDS/IB: rds_ib_cm_handle_connect() forgot to unlock c_cm_lock

rds_ib_cm_handle_connect() could return without unlocking the c_conn_lock if
rds_setup_qp() failed.  Rather than adding another imbalanced mutex_unlock() to
this error path we only unlock the mutex once as we exit the function, reducing
the likelyhood of making this same mistake in the future.  We remove the
previous mulitple return sites, leaving one unambigious return path.

Signed-off-by: Zach Brown <zach.brown@oracle.com>
14 years agords: Fix reference counting on the for xmit_atomic and xmit_rdma
Chris Mason [Tue, 11 May 2010 23:15:35 +0000 (16:15 -0700)]
rds: Fix reference counting on the for xmit_atomic and xmit_rdma

This makes sure we have the proper number of references in
rds_ib_xmit_atomic and rds_ib_xmit_rdma.  We also consistently
drop references the same way for all message types as the IOs end.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
14 years agords: use RCU to protect the connection hash
Chris Mason [Tue, 11 May 2010 22:15:15 +0000 (15:15 -0700)]
rds: use RCU to protect the connection hash

The connection hash was almost entirely RCU ready, this
just makes the final couple of changes to use RCU instead
of spinlocks for everything.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
14 years agoRDS: use locking on the connection hash list
Chris Mason [Tue, 11 May 2010 22:14:52 +0000 (15:14 -0700)]
RDS: use locking on the connection hash list

rds_conn_destroy really needs locking while it changes the
connection hash.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
14 years agords: Fix RDMA message reference counting
Chris Mason [Tue, 11 May 2010 22:14:16 +0000 (15:14 -0700)]
rds: Fix RDMA message reference counting

The RDS send_xmit code was trying to get fancy with message
counting and was dropping the final reference on the RDMA messages
too early.  This resulted in memory corruption and oopsen.

The fix here is to always add a ref as the parts of the message passes
through rds_send_xmit, and always drop a ref as the parts of the message
go through completion handling.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
14 years agords: don't let RDS shutdown a connection while senders are present
Chris Mason [Tue, 11 May 2010 22:11:11 +0000 (15:11 -0700)]
rds: don't let RDS shutdown a connection while senders are present

This is the first in a long line of patches that tries to fix races
between RDS connection shutdown and RDS traffic.

Here we are maintaining a count of active senders to make sure
the connection doesn't go away while they are using it.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
14 years agords: Use RCU for the bind lookup searches
Chris Mason [Tue, 11 May 2010 22:09:45 +0000 (15:09 -0700)]
rds: Use RCU for the bind lookup searches

The RDS bind lookups are somewhat expensive in terms of CPU
time and locking overhead.  This commit changes them into a
faster RCU based hash tree instead of the rbtrees they were using
before.

On large NUMA systems it is a significant improvement.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
14 years agoRDS/IB: add _to_node() macros for numa and use {k,v}malloc_node()
Andy Grover [Fri, 23 Apr 2010 17:49:53 +0000 (10:49 -0700)]
RDS/IB: add _to_node() macros for numa and use {k,v}malloc_node()

Allocate send/recv rings in memory that is node-local to the HCA.
This significantly helps performance.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
14 years agoRDS/IB: Remove unused variable in ib_remove_addr()
Andy Grover [Fri, 23 Apr 2010 18:04:21 +0000 (11:04 -0700)]
RDS/IB: Remove unused variable in ib_remove_addr()

Signed-off-by: Andy Grover <andy.grover@oracle.com>
14 years agords: rcu-ize rds_ib_get_device()
Chris Mason [Fri, 23 Apr 2010 01:59:15 +0000 (21:59 -0400)]
rds: rcu-ize rds_ib_get_device()

rds_ib_get_device is called very often as we turn an
ip address into a corresponding device structure.  It currently
take a global spinlock as it walks different lists to find active
devices.

This commit changes the lists over to RCU, which isn't very complex
because they are not updated very often at all.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
14 years agords: per-rm flush_wait waitq
Chris Mason [Wed, 21 Apr 2010 20:09:28 +0000 (13:09 -0700)]
rds: per-rm flush_wait waitq

This removes a global waitqueue used to wait for rds messages
and replaces it with a waitqueue inside the rds_message struct.

The global waitqueue turns into a global lock and significantly
bottlenecks operations on large machines.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
14 years agords: switch to rwlock on bind_lock
Chris Mason [Wed, 21 Apr 2010 20:04:43 +0000 (13:04 -0700)]
rds: switch to rwlock on bind_lock

The bind_lock is almost entirely readonly, but it gets
hammered during normal operations and is a major bottleneck.

This commit changes it to an rwlock, which takes it from 80%
of the system time on a big numa machine down to much lower
numbers.

A better fix would involve RCU, which is done in a later commit

Signed-off-by: Chris Mason <chris.mason@oracle.com>
14 years agoRDS: Update comments in rds_send_xmit()
Andy Grover [Fri, 16 Apr 2010 00:19:29 +0000 (17:19 -0700)]
RDS: Update comments in rds_send_xmit()

Update comments to reflect changes in previous commit.

Keeping as separate commits due to different authorship.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
14 years agoRDS: Use a generation counter to avoid rds_send_xmit loop
Chris Mason [Thu, 15 Apr 2010 20:38:14 +0000 (16:38 -0400)]
RDS: Use a generation counter to avoid rds_send_xmit loop

rds_send_xmit is required to loop around after it releases the lock
because someone else could done a trylock, found someone working on the
list and backed off.

But, once we drop our lock, it is possible that someone else does come
in and make progress on the list.  We should detect this and not loop
around if another process is actually working on the list.

This patch adds a generation counter that is bumped every time we
get the lock and do some send work.  If the retry notices someone else
has bumped the generation counter, it does not need to loop around and
continue working.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Andy Grover <andy.grover@oracle.com>
14 years agoRDS: Get pong working again
Andy Grover [Thu, 1 Apr 2010 01:56:25 +0000 (18:56 -0700)]
RDS: Get pong working again

Call send_xmit() directly from pong()

Set pongs as op_active

Signed-off-by: Andy Grover <andy.grover@oracle.com>
14 years agoRDS: Do wait_event_interruptible instead of wait_event
Andy Grover [Tue, 30 Mar 2010 00:10:01 +0000 (17:10 -0700)]
RDS: Do wait_event_interruptible instead of wait_event

Can't see a reason not to allow signals to interrupt the wait.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
14 years agoRDS: Remove send_quota from send_xmit()
Andy Grover [Tue, 30 Mar 2010 00:08:49 +0000 (17:08 -0700)]
RDS: Remove send_quota from send_xmit()

The purpose of the send quota was really to give fairness
when different connections were all using the same
workq thread to send backlogged msgs -- they could only send
so many before another connection could make progress.

Now that each connection is pushing the backlog from its
completion handler, they are all guaranteed to make progress
and the quota isn't needed any longer.

A thread *will* have to send all previously queued data, as well
as any further msgs placed on the queue while while c_send_lock
was held. In a pathological case a single process can get
roped into doing this for long periods while other threads
get off free. But, since it can only do this until the transport
reports full, this is a bounded scenario.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
14 years agoRDS: Move atomic stats from general to ib-specific area
Andy Grover [Tue, 30 Mar 2010 00:47:30 +0000 (17:47 -0700)]
RDS: Move atomic stats from general to ib-specific area

Signed-off-by: Andy Grover <andy.grover@oracle.com>
14 years agoRDS: rds_message_unmapped() doesn't need to check if queue active
Andy Grover [Mon, 29 Mar 2010 23:52:12 +0000 (16:52 -0700)]
RDS: rds_message_unmapped() doesn't need to check if queue active

If the queue has nobody on it, then wake_up does nothing.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
14 years agoRDS: Fix locking in send on m_rs_lock
Andy Grover [Mon, 29 Mar 2010 23:50:54 +0000 (16:50 -0700)]
RDS: Fix locking in send on m_rs_lock

Do not nest m_rs_lock under c_lock

Disable interrupts in {rdma,atomic}_send_complete

Signed-off-by: Andy Grover <andy.grover@oracle.com>
14 years agoRDS: Use NOWAIT in message_map_pages()
Andy Grover [Mon, 29 Mar 2010 23:46:46 +0000 (16:46 -0700)]
RDS: Use NOWAIT in message_map_pages()

Can no longer block, so use NOWAIT.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
14 years agoRDS: Bypass workqueue when queueing cong updates
Andy Grover [Mon, 29 Mar 2010 23:45:40 +0000 (16:45 -0700)]
RDS: Bypass workqueue when queueing cong updates

Now that rds_send_xmit() does not block, we can call it directly
instead of going through the helper thread.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
14 years agoRDS: Call rds_send_xmit() directly from sendmsg()
Andy Grover [Mon, 29 Mar 2010 23:20:18 +0000 (16:20 -0700)]
RDS: Call rds_send_xmit() directly from sendmsg()

rds_sendmsg() is calling the send worker function to
send the just-queued datagrams, presumably because it wants
the behavior where anything not sent will re-call the send
worker. We now ensure all queued datagrams are sent by retrying
from the send completion handler, so this isn't needed any more.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
14 years agoRDS: rds_send_xmit() locking/irq fixes
Andy Grover [Wed, 24 Mar 2010 00:48:04 +0000 (17:48 -0700)]
RDS: rds_send_xmit() locking/irq fixes

rds_message_put() cannot be called with irqs off, so move it after
irqs are re-enabled.

Spinlocks throughout the function do not to use _irqsave because
the lock of c_send_lock at top already disabled irqs.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
14 years agoRDS: Change send lock from a mutex to a spinlock
Andy Grover [Wed, 24 Mar 2010 00:39:07 +0000 (17:39 -0700)]
RDS: Change send lock from a mutex to a spinlock

This change allows us to call rds_send_xmit() from a tasklet,
which is crucial to our new operating model.

* Change c_send_lock to a spinlock
* Update stats fields "sem_" to "_lock"
* Remove unneeded rds_conn_is_sending()

About locking between shutdown and send -- send checks if the
connection is up. Shutdown puts the connection into
DISCONNECTING. After this, all threads entering send will exit
immediately. However, a thread could be *in* send_xmit(), so
shutdown acquires the c_send_lock to ensure everyone is out
before proceeding with connection shutdown.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
14 years agoRDS: Refill recv ring directly from tasklet
Andy Grover [Fri, 19 Mar 2010 00:19:52 +0000 (17:19 -0700)]
RDS: Refill recv ring directly from tasklet

Performance is better if we use allocations that don't block
to refill the receive ring. Since the whole reason we were
kicking out to the worker thread was so we could do blocking
allocs, we no longer need to do this.

Remove gfp params from rds_ib_recv_refill(); we always use
GFP_NOWAIT.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
14 years agoRDS: Stop supporting old cong map sending method
Andy Grover [Mon, 22 Mar 2010 22:22:04 +0000 (15:22 -0700)]
RDS: Stop supporting old cong map sending method

We now ask the transport to give us a rm for the congestion
map, and then we handle it normally. Previously, the
transport defined a function that we would call to send
a congestion map.

Convert TCP and loop transports to new cong map method.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
14 years agoRDS/IB: Do not wait for send ring to be empty on conn shutdown
Andy Grover [Thu, 4 Mar 2010 03:25:21 +0000 (19:25 -0800)]
RDS/IB: Do not wait for send ring to be empty on conn shutdown

Now that we are signaling send completions much less, we are likely
to have dirty entries in the send queue when the connection is
shut down (on rmmod, for example.) These are cleaned up a little
further down in conn_shutdown, but if we wait on the ring_empty_wait
for them, it'll never happen, and we hand on unload.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
14 years agoRDS: Perform unmapping ops in stages
Andy Grover [Mon, 1 Mar 2010 22:03:09 +0000 (14:03 -0800)]
RDS: Perform unmapping ops in stages

Previously, RDS would wait until the final send WR had completed
and then handle cleanup. With silent ops, we do not know
if an atomic, rdma, or data op will be last. This patch
handles any of these cases by keeping a pointer to the last
op in the message in m_last_op.

When the TX completion event fires, rds dispatches to per-op-type
cleanup functions, and then does whole-message cleanup, if the
last op equalled m_last_op.

This patch also moves towards having op-specific functions take
the op struct, instead of the overall rm struct.

rds_ib_connection has a pointer to keep track of a a partially-
completed data send operation. This patch changes it from an
rds_message pointer to the narrower rm_data_op pointer, and
modifies places that use this pointer as needed.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
14 years agoRDS: Make sure cmsgs aren't used in improper ways
Andy Grover [Tue, 13 Apr 2010 19:00:35 +0000 (12:00 -0700)]
RDS: Make sure cmsgs aren't used in improper ways

It hasn't cropped up in the field, but this code ensures it is
impossible to issue operations that pass an rdma cookie (DEST, MAP)
in the same sendmsg call that's actually initiating rdma or atomic
ops.

Disallowing this perverse-but-technically-allowed usage makes silent
RDMA heuristics slightly easier.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
14 years agoRDS: Add flag for silent ops. Do atomic op before RDMA
Andy Grover [Tue, 2 Mar 2010 00:10:40 +0000 (16:10 -0800)]
RDS: Add flag for silent ops. Do atomic op before RDMA

Add a flag to the API so users can indicate they want
silent operations. This is needed because silent ops
cannot be used with USE_ONCE MRs, so we can't just
assume silent.

Also, change send_xmit to do atomic op before rdma op if
both are present, and centralize the hairy logic to determine if
we want to attempt silent, or not.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
14 years agoRDS: Move some variables around for consistency
Andy Grover [Tue, 2 Mar 2010 00:04:59 +0000 (16:04 -0800)]
RDS: Move some variables around for consistency

Also, add a comment.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
14 years agoRDS: queue failure notifications for dropped atomic ops
Andy Grover [Sat, 20 Feb 2010 02:04:58 +0000 (18:04 -0800)]
RDS: queue failure notifications for dropped atomic ops

When dropping ops in the send queue, we notify the client
of failed rdma ops they asked for notifications on, but not
atomic ops. It should be for both.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
14 years agoRDS: Add a warning if trying to allocate 0 sgs
Andy Grover [Thu, 4 Feb 2010 03:41:52 +0000 (19:41 -0800)]
RDS: Add a warning if trying to allocate 0 sgs

rds_message_alloc_sgs() only works when nents is nonzero.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
14 years agoRDS: Do not set op_active in r_m_copy_from_user().
Andy Grover [Thu, 4 Feb 2010 03:40:32 +0000 (19:40 -0800)]
RDS: Do not set op_active in r_m_copy_from_user().

Do not allocate sgs for data for 0-length datagrams

Set data.op_active in rds_sendmsg() instead of
rds_message_copy_from_user().

Signed-off-by: Andy Grover <andy.grover@oracle.com>
14 years agoRDS: Rewrite rds_send_xmit
Andy Grover [Thu, 4 Feb 2010 03:36:44 +0000 (19:36 -0800)]
RDS: Rewrite rds_send_xmit

Simplify rds_send_xmit().

Send a congestion map (via xmit_cong_map) without
decrementing send_quota.

Move resetting of conn xmit variables to end of loop.

Update comments.

Implement a special case to turn off sending an rds header
when there is an atomic op and no other data.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
14 years agoRDS: Rename data op members prefix from m_ to op_
Andy Grover [Thu, 28 Jan 2010 02:04:18 +0000 (18:04 -0800)]
RDS: Rename data op members prefix from m_ to op_

For consistency.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
14 years agoRDS: Remove struct rds_rdma_op
Andy Grover [Mon, 1 Mar 2010 22:11:53 +0000 (14:11 -0800)]
RDS: Remove struct rds_rdma_op

A big changeset, but it's all pretty dumb.

struct rds_rdma_op was already embedded in struct rm_rdma_op.
Remove rds_rdma_op and put its members in rm_rdma_op. Rename
members with "op_" prefix instead of "r_", for consistency.

Of course this breaks a lot, so fixup the code accordingly.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
14 years agoRDS: purge atomic resources too in rds_message_purge()
Andy Grover [Thu, 28 Jan 2010 00:15:48 +0000 (16:15 -0800)]
RDS: purge atomic resources too in rds_message_purge()

Add atomic_free_op function, analogous to rdma_free_op,
and call it in rds_message_purge().

Signed-off-by: Andy Grover <andy.grover@oracle.com>
14 years agoRDS: Inline rdma_prepare into cmsg_rdma_args
Andy Grover [Thu, 28 Jan 2010 00:07:30 +0000 (16:07 -0800)]
RDS: Inline rdma_prepare into cmsg_rdma_args

cmsg_rdma_args just calls rdma_prepare and does a little
arg checking -- not quite enough to justify its existence.
Plus, it is the only caller of rdma_prepare().

Signed-off-by: Andy Grover <andy.grover@oracle.com>
14 years agoRDS: Implement silent atomics
Andy Grover [Wed, 20 Jan 2010 05:25:26 +0000 (21:25 -0800)]
RDS: Implement silent atomics

Signed-off-by: Andy Grover <andy.grover@oracle.com>
14 years agoRDS: Move loop-only function to loop.c
Andy Grover [Wed, 20 Jan 2010 02:14:56 +0000 (18:14 -0800)]
RDS: Move loop-only function to loop.c

Also, try to better-document the locking around the
rm and its m_inc in loop.c.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
14 years agoRDS/IB: Make all flow control code conditional on i_flowctl
Andy Grover [Fri, 15 Jan 2010 23:55:26 +0000 (15:55 -0800)]
RDS/IB: Make all flow control code conditional on i_flowctl

Maybe things worked fine with the flow control code running
even in the non-flow-control case, but making it explicitly
conditional helps the non-fc case be easier to read.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
14 years agoRDS: Remove unsignaled_bytes sysctl
Andy Grover [Thu, 14 Jan 2010 23:08:33 +0000 (15:08 -0800)]
RDS: Remove unsignaled_bytes sysctl

Removed unsignaled_bytes sysctl and code to signal
based on it. I believe unsignaled_wrs is more than
sufficient for our purposes.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
14 years agoRDS: rewrite rds_ib_xmit
Andy Grover [Thu, 14 Jan 2010 20:18:11 +0000 (12:18 -0800)]
RDS: rewrite rds_ib_xmit

Now that the header always goes first, it is possible to
simplify rds_ib_xmit. Instead of having a path to handle 0-byte
dgrams and another path to handle >0, these can both be handled
in one path. This lets us eliminate xmit_populate_wr().

Rename sent to bytes_sent, to differentiate better from other
variable named "send".

Signed-off-by: Andy Grover <andy.grover@oracle.com>
14 years agoRDS/IB: Remove ib_[header/data]_sge() functions
Andy Grover [Thu, 14 Jan 2010 00:32:24 +0000 (16:32 -0800)]
RDS/IB: Remove ib_[header/data]_sge() functions

These functions were to cope with differently ordered
sg entries depending on RDS 3.0 or 3.1+. Now that
we've dropped 3.0 compatibility we no longer need them.

Also, modify usage sites for these to refer to sge[0] or [1]
directly. Reorder code to initialize header sgs first.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
14 years agoRDS/IB: Remove dead code
Andy Grover [Thu, 14 Jan 2010 00:29:37 +0000 (16:29 -0800)]
RDS/IB: Remove dead code

Signed-off-by: Andy Grover <andy.grover@oracle.com>
14 years agoRDS/IB: Disallow connections less than RDS 3.1
Andy Grover [Wed, 13 Jan 2010 23:50:09 +0000 (15:50 -0800)]
RDS/IB: Disallow connections less than RDS 3.1

RDS 3.0 connections (in OFED 1.3 and earlier) put the
header at the end. 3.1 connections put it at the head.
The code has significant added complexity in order to
handle both configurations. In OFED 1.6 we can
drop this and simplify the code by only supporting
"header-first" configuration.

This patch checks the protocol version, and if prior
to 3.1, does not complete the connection.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
14 years agoRDS/IB: eliminate duplicate code
Andy Grover [Tue, 12 Jan 2010 22:43:06 +0000 (14:43 -0800)]
RDS/IB: eliminate duplicate code

both atomics and rdmas need to convert ib-specific completion codes
into RDS status codes. Rename rds_ib_rdma_send_complete to
rds_ib_send_complete, and have it take a pointer to the function to
call with the new error code.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
14 years agoRDS: inc_purge() transport function unused - remove it
Andy Grover [Tue, 12 Jan 2010 22:41:46 +0000 (14:41 -0800)]
RDS: inc_purge() transport function unused - remove it

Signed-off-by: Andy Grover <andy.grover@oracle.com>
14 years agoRDS: Whitespace
Andy Grover [Tue, 12 Jan 2010 18:53:05 +0000 (10:53 -0800)]
RDS: Whitespace

Tidy up some whitespace issues.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
14 years agoRDS: Do not mask address when pinning pages
Andy Grover [Tue, 12 Jan 2010 18:52:28 +0000 (10:52 -0800)]
RDS: Do not mask address when pinning pages

This does not appear to be necessary.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
14 years agoRDS: Base init_depth and responder_resources on hw values
Andy Grover [Tue, 12 Jan 2010 18:50:48 +0000 (10:50 -0800)]
RDS: Base init_depth and responder_resources on hw values

Instead of using a constant for initiator_depth and
responder_resources, read the per-QP values when the
device is enumerated, and then use these values when creating
the connection.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
14 years agoRDS: Implement atomic operations
Andy Grover [Tue, 12 Jan 2010 22:33:38 +0000 (14:33 -0800)]
RDS: Implement atomic operations

Implement a CMSG-based interface to do FADD and CSWP ops.

Alter send routines to handle atomic ops.

Add atomic counters to stats.

Add xmit_atomic() to struct rds_transport

Inline rds_ib_send_unmap_rdma into unmap_rm

Signed-off-by: Andy Grover <andy.grover@oracle.com>
14 years agoRDS: Clear up some confusing code in send_remove_from_sock
Andy Grover [Tue, 12 Jan 2010 22:19:32 +0000 (14:19 -0800)]
RDS: Clear up some confusing code in send_remove_from_sock

The previous code was correct, but made the assumption that
if r_notifier was non-NULL then either r_recverr or r_notify
was true. Valid, but fragile. Changed to explicitly check
r_recverr (shows up in greps for recverr now, too.)

Signed-off-by: Andy Grover <andy.grover@oracle.com>
14 years agoRDS: make sure all sgs alloced are initialized
Andy Grover [Tue, 12 Jan 2010 22:17:31 +0000 (14:17 -0800)]
RDS: make sure all sgs alloced are initialized

rds_message_alloc_sgs() now returns correctly-initialized
sg lists, so calleds need not do this themselves.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
14 years agoRDS: make m_rdma_op a member of rds_message
Andy Grover [Tue, 12 Jan 2010 22:13:15 +0000 (14:13 -0800)]
RDS: make m_rdma_op a member of rds_message

This eliminates a separate memory alloc, although
it is now necessary to add an "r_active" flag, since
it is no longer to use the m_rdma_op pointer as an
indicator of if an rdma op is present.

rdma SGs allocated from rm sg pool.

rds_rm_size also gets bigger. It's a little inefficient to
run through CMSGs twice, but it makes later steps a lot smoother.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
14 years agoRDS: fold rdma.h into rds.h
Andy Grover [Tue, 12 Jan 2010 20:57:27 +0000 (12:57 -0800)]
RDS: fold rdma.h into rds.h

RDMA is now an intrinsic part of RDS, so it's easier to just have
a single header.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
14 years agoRDS: Explicitly allocate rm in sendmsg()
Andy Grover [Tue, 12 Jan 2010 20:56:06 +0000 (12:56 -0800)]
RDS: Explicitly allocate rm in sendmsg()

r_m_copy_from_user used to allocate the rm as well as kernel
buffers for the data, and then copy the data in. Now, sendmsg()
allocates the rm, although the data buffer alloc still happens
in r_m_copy_from_user.

SGs are still allocated with rm, but now r_m_alloc_sgs() is
used to reserve them. This allows multiple SG lists to be
allocated from the one rm -- this is important once we also
want to alloc our rdma sgl from this pool.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
14 years agoRDS: cleanup/fix rds_rdma_unuse
Andy Grover [Tue, 12 Jan 2010 20:37:17 +0000 (12:37 -0800)]
RDS: cleanup/fix rds_rdma_unuse

First, it looks to me like the atomic_inc is wrong.
We should be decrementing refcount only once here, no? It's
already being done by the mr_put() at the end.

Second, simplify the logic a bit by bailing early (with a warning)
if !mr.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
14 years agoRDS: break out rdma and data ops into nested structs in rds_message
Andy Grover [Tue, 12 Jan 2010 20:15:02 +0000 (12:15 -0800)]
RDS: break out rdma and data ops into nested structs in rds_message

Clearly separate rdma-related variables in rm from data-related ones.
This is in anticipation of adding atomic support.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
14 years agoRDS: cleanup: remove "== NULL"s and "!= NULL"s in ptr comparisons
Andy Grover [Tue, 12 Jan 2010 19:56:44 +0000 (11:56 -0800)]
RDS: cleanup: remove "== NULL"s and "!= NULL"s in ptr comparisons

Favor "if (foo)" style over "if (foo != NULL)".

Signed-off-by: Andy Grover <andy.grover@oracle.com>
14 years agoRDS: move rds_shutdown_worker impl. to rds_conn_shutdown
Andy Grover [Fri, 11 Jun 2010 20:49:13 +0000 (13:49 -0700)]
RDS: move rds_shutdown_worker impl. to rds_conn_shutdown

This fits better in connection.c, rather than threads.c.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
14 years agoRDS: Fix locking in send on m_rs_lock
Andy Grover [Mon, 29 Mar 2010 23:50:54 +0000 (16:50 -0700)]
RDS: Fix locking in send on m_rs_lock

Do not nest m_rs_lock under c_lock

Disable interrupts in {rdma,atomic}_send_complete

Signed-off-by: Andy Grover <andy.grover@oracle.com>
14 years agoRDS: Rewrite rds_send_drop_to() for clarity
Andy Grover [Sat, 20 Feb 2010 02:01:41 +0000 (18:01 -0800)]
RDS: Rewrite rds_send_drop_to() for clarity

This function has been the source of numerous bugs; it's just
too complicated. Simplified to nest spinlocks cleanly within
the second loop body, and kick out early if there are no
rms to drop.

This will be a little slower because conn lock is grabbed for
each entry instead of "caching" the lock across rms, but this
should be entirely irrelevant to fastpath performance.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
14 years agoRDS: Fix corrupted rds_mrs
Tina Yang [Thu, 1 Apr 2010 21:09:00 +0000 (14:09 -0700)]
RDS: Fix corrupted rds_mrs

On second look at this bug (OFED #2002), it seems that the
collision is not with the retransmission queue (packet acked
by the peer), but with the local send completion.  A theoretical
sequence of events (from time t0 to t3) is thought to be as
follows,

Thread #1
t0:
    sock_release
    rds_release
    rds_send_drop_to /* wait on send completion */
t2:
    rds_rdma_drop_keys()   /* destroy & free all mrs */

Thread #2
t1:
    rds_ib_send_cq_comp_handler
    rds_ib_send_unmap_rm
    rds_message_unmapped   /* wake up #1 @ t0 */
t3:
    rds_message_put
    rds_message_purge
    rds_mr_put   /* memory corruption detected */

The problem with the rds_rdma_drop_keys() is it could
remove a mr's refcount more than its due (i.e. repeatedly
as long as it still remains in the tree (mr->r_refcount > 0)).
Theoretically it should remove only one reference - reference
by the tree.

        /* Release any MRs associated with this socket */
        while ((node = rb_first(&rs->rs_rdma_keys))) {
                mr = container_of(node, struct rds_mr, r_rb_node);
                if (mr->r_trans == rs->rs_transport)
                        mr->r_invalidate = 0;
                rds_mr_put(mr);
        }

I think the correct way of doing it is to remove the mr from
the tree and rds_destroy_mr it first, then a rds_mr_put()
to decrement its reference count by one.  Whichever thread
holds the last reference will free the mr via rds_mr_put().

Signed-off-by: Tina Yang <tina.yang@oracle.com>
Signed-off-by: Andy Grover <andy.grover@oracle.com>
14 years agoRDS: Fix BUG_ONs to not fire when in a tasklet
Andy Grover [Sat, 13 Mar 2010 00:22:32 +0000 (16:22 -0800)]
RDS: Fix BUG_ONs to not fire when in a tasklet

in_interrupt() is true in softirqs. The BUG_ONs are supposed
to check for if irqs are disabled, so we should use
BUG_ON(irqs_disabled()) instead, duh.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
14 years agonet: poll() optimizations
Eric Dumazet [Mon, 6 Sep 2010 11:13:50 +0000 (11:13 +0000)]
net: poll() optimizations

No need to test twice sk->sk_shutdown & RCV_SHUTDOWN

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agomlx4_en: Fixed Ethtool statistics report
Yevgeny Petrilin [Sun, 5 Sep 2010 22:20:24 +0000 (22:20 +0000)]
mlx4_en: Fixed Ethtool statistics report

The values didn't match the title after removing the LRO
statistics in commit fa37a9586f92051de03a13e55e5ec3880bb6783e

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>