MD RAID10: Improve redundancy for 'far' and 'offset' algorithms (part 1)
The MD RAID10 'far' and 'offset' algorithms make copies of entire stripe
widths - copying them to a different location on the same devices after
shifting the stripe. An example layout of each follows below:
"far" algorithm
dev1 dev2 dev3 dev4 dev5 dev6
==== ==== ==== ==== ==== ====
A B C D E F
G H I J K L
...
F A B C D E --> Copy of stripe0, but shifted by 1
L G H I J K
...
"offset" algorithm
dev1 dev2 dev3 dev4 dev5 dev6
==== ==== ==== ==== ==== ====
A B C D E F
F A B C D E --> Copy of stripe0, but shifted by 1
G H I J K L
L G H I J K
...
Redundancy for these algorithms is gained by shifting the copied stripes
one device to the right. This patch proposes that array be divided into
sets of adjacent devices and when the stripe copies are shifted, they wrap
on set boundaries rather than the array size boundary. That is, for the
purposes of shifting, the copies are confined to their sets within the
array. The sets are 'near_copies * far_copies' in size.
The above "far" algorithm example would change to:
"far" algorithm
dev1 dev2 dev3 dev4 dev5 dev6
==== ==== ==== ==== ==== ====
A B C D E F
G H I J K L
...
B A D C F E --> Copy of stripe0, shifted 1, 2-dev sets
H G J I L K Dev sets are 1-2, 3-4, 5-6
...
This has the affect of improving the redundancy of the array. We can
always sustain at least one failure, but sometimes more than one can
be handled. In the first examples, the pairs of devices that CANNOT fail
together are:
(1,2) (2,3) (3,4) (4,5) (5,6) (1, 6) [40% of possible pairs]
In the example where the copies are confined to sets, the pairs of
devices that cannot fail together are:
(1,2) (3,4) (5,6) [20% of possible pairs]
We cannot simply replace the old algorithms, so the 17th bit of the 'layout'
variable is used to indicate whether we use the old or new method of computing
the shift. (This is similar to the way the 16th bit indicates whether the
"far" algorithm or the "offset" algorithm is being used.)
This patch only handles the cases where the number of total raid disks is
a multiple of 'far_copies'. A follow-on patch addresses the condition where
this is not true.
Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>