ceph: fix connection fault con_work reentrancy problem
authorSage Weil <sage@newdream.net>
Thu, 18 Mar 2010 22:20:53 +0000 (15:20 -0700)
committerSage Weil <sage@newdream.net>
Tue, 23 Mar 2010 14:46:59 +0000 (07:46 -0700)
The messenger fault was clearing the BUSY bit, for reasons unclear.  This
made it possible for the con->ops->fault function to reopen the connection,
and requeue work in the workqueue--even though the current thread was
already in con_work.

This avoids a problem where the client busy loops with connection failures
on an unreachable OSD, but doesn't address the root cause of that problem.

Signed-off-by: Sage Weil <sage@newdream.net>
fs/ceph/messenger.c

index 203c4359b549a21071e3800f119aaae5c42184a0..983285540945451149c1363ea65a098d4f27c2ca 100644 (file)
@@ -1836,8 +1836,6 @@ static void ceph_fault(struct ceph_connection *con)
                goto out;
        }
 
-       clear_bit(BUSY, &con->state);  /* to avoid an improbable race */
-
        mutex_lock(&con->mutex);
        if (test_bit(CLOSED, &con->state))
                goto out_unlock;