md: Don't update ->recovery_offset when reshaping an array to fewer devices.
authorNeilBrown <neilb@suse.de>
Wed, 16 Jun 2010 07:01:25 +0000 (17:01 +1000)
committerNeilBrown <neilb@suse.de>
Thu, 24 Jun 2010 03:35:18 +0000 (13:35 +1000)
When an array is reshaped to have fewer devices, the reshape proceeds
from the end of the devices to the beginning.

If a device happens to be non-In_sync (which is possible but rare)
we would normally update the ->recovery_offset as the reshape
progresses. However that would be wrong as the recover_offset records
that the early part of the device is in_sync, while in fact it would
only be the later part that is in_sync, and in any case the offset
number would be measured from the wrong end of the device.

Relatedly, if after a reshape a spare is discovered to not be
recoverred all the way to the end, not allow spare_active
to incorporate it in the array.

This becomes relevant in the following sample scenario:

A 4 drive RAID5 is converted to a 6 drive RAID6 in a combined
operation.
The RAID5->RAID6 conversion will cause a 5 drive to be included as a
spare, then the 5drive -> 6drive reshape will effectively rebuild that
spare as it progresses.  The 6th drive is treated as in_sync the whole
time as there is never any case that we might consider reading from
it, but must not because there is no valid data.

If we interrupt this reshape part-way through and reverse it to return
to a 5-drive RAID6 (or event a 4-drive RAID5), we don't want to update
the recovery_offset - as that would be wrong - and we don't want to
include that spare as active in the 5-drive RAID6 when the reversed
reshape completed and it will be mostly out-of-sync still.

Signed-off-by: NeilBrown <neilb@suse.de>
drivers/md/md.c
drivers/md/raid5.c

index 4869128bf742b15dfe3601e6b8851eb79168c0f2..cb20d0b0555adee7c12d5e643a65d142f7f054e0 100644 (file)
@@ -2087,6 +2087,7 @@ static void sync_sbs(mddev_t * mddev, int nospares)
        /* First make sure individual recovery_offsets are correct */
        list_for_each_entry(rdev, &mddev->disks, same_set) {
                if (rdev->raid_disk >= 0 &&
+                   mddev->delta_disks >= 0 &&
                    !test_bit(In_sync, &rdev->flags) &&
                    mddev->curr_resync_completed > rdev->recovery_offset)
                                rdev->recovery_offset = mddev->curr_resync_completed;
@@ -6872,6 +6873,7 @@ void md_do_sync(mddev_t *mddev)
                        rcu_read_lock();
                        list_for_each_entry_rcu(rdev, &mddev->disks, same_set)
                                if (rdev->raid_disk >= 0 &&
+                                   mddev->delta_disks >= 0 &&
                                    !test_bit(Faulty, &rdev->flags) &&
                                    !test_bit(In_sync, &rdev->flags) &&
                                    rdev->recovery_offset < mddev->curr_resync)
index 2c055dec8c6808fad15c501f51a5d953872b8799..f972a94bbc324273cdc53e46f12a2e72d7ad5b1d 100644 (file)
@@ -5208,6 +5208,7 @@ static int raid5_spare_active(mddev_t *mddev)
        for (i = 0; i < conf->raid_disks; i++) {
                tmp = conf->disks + i;
                if (tmp->rdev
+                   && tmp->rdev->recovery_offset == MaxSector
                    && !test_bit(Faulty, &tmp->rdev->flags)
                    && !test_and_set_bit(In_sync, &tmp->rdev->flags)) {
                        unsigned long flags;