drbd: drbdsetup detach of an unresponsive local disk should not block IO "forever"
authorLars Ellenberg <lars.ellenberg@linbit.com>
Mon, 26 Jan 2015 10:35:38 +0000 (11:35 +0100)
committerJens Axboe <axboe@fb.com>
Wed, 25 Nov 2015 16:22:01 +0000 (09:22 -0700)
When detaching, we make sure no application IO is in-flight
by internally suspending IO, then trigger the state change,
wait for the result, and finally internally resume IO again.

Once we triggered the stat change to "Failed",
we expect it to change from Failed to Diskless.
(To avoid races, we actually wait for it to leave "Failed").

On an unresponsive local IO backend, this may not happen, ever.
Don't have a "hung" detach block IO "forever", but resume IO
before waiting for the state change to Diskless.

We may well be able to continue IO to and from a healthy peer.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
drivers/block/drbd/drbd_nl.c

index af78f0906cb20005ce53c2f8d44d74c60f19ccd5..331b378b7d0b233fee4af55442187fbc9b319eaf 100644 (file)
@@ -1929,9 +1929,9 @@ static int adm_detach(struct drbd_device *device, int force)
        retcode = drbd_request_state(device, NS(disk, D_FAILED));
        drbd_md_put_buffer(device);
        /* D_FAILED will transition to DISKLESS. */
+       drbd_resume_io(device);
        ret = wait_event_interruptible(device->misc_wait,
                        device->state.disk != D_FAILED);
-       drbd_resume_io(device);
        if ((int)retcode == (int)SS_IS_DISKLESS)
                retcode = SS_NOTHING_TO_DO;
        if (ret)