libceph: handle connection reopen race with callbacks
authorSage Weil <sage@newdream.net>
Thu, 19 May 2011 18:21:05 +0000 (11:21 -0700)
committerSage Weil <sage@newdream.net>
Thu, 19 May 2011 18:21:05 +0000 (11:21 -0700)
commit0da5d70369e87f80adf794080cfff1ca15a34198
treeb9d2fcaa52903e1c9b87ad7edfc24fb294320bce
parent3b663780347ce532b08be1c859b1df14f0eea4c8
libceph: handle connection reopen race with callbacks

If a connection is closed and/or reopened (ceph_con_close, ceph_con_open)
it can race with a callback.  con_work does various state checks for
closed or reopened sockets at the beginning, but drops con->mutex before
making callbacks.  We need to check for state bit changes after retaking
the lock to ensure we restart con_work and execute those CLOSED/OPENING
tests or else we may end up operating under stale assumptions.

In Jim's case, this was causing 'bad tag' errors.

There are four cases where we re-take the con->mutex inside con_work: catch
them all and return EAGAIN from try_{read,write} so that we can restart
con_work.

Reported-by: Jim Schutt <jaschut@sandia.gov>
Tested-by: Jim Schutt <jaschut@sandia.gov>
Signed-off-by: Sage Weil <sage@newdream.net>
net/ceph/messenger.c