vSAN RAID Levels and Fault Domains
One of my customers is considering implementing vSAN, so I've been researching it quite a bit lately. The interactions of vSAN RAID levels (for all-flash configurations) and Fault Domains is fairly complex, so I figured that I should post some notes about what I've learned here.
First, the concept of RAID is a little different in vSAN than it is in a traditional array. Traditionally, RAID specifies the algorithm used to spread data (or parity data) across a set of disks. For example, RAID 5 specifies that data will be striped across all of the disks in a set, with a single disk's capacity used for parity. This means that a 3 disk RAID 5 set will store data on 66% of its disks' capacity. A 5 disk RAID 5 set will store data on 80% of its disks' capacity.
vSAN treats RAID differently. There are 3 different RAID types that vSAN supports: RAID 1, 5 and 6. Like in a traditional array, these RAID levels describe the data redundancy algorithm used, but the members are considered a little differently. Just like you'd expect, a vSAN RAID 5 configuration will have 1 parity block, but how many data blocks will it use? In order to support RAID 5, VMware requires at least 4 ESXi hosts and each of those hosts might have 3 or more capacity tier SSDs, so does this mean that a RAID 5 configuration is striped across all 12 SSDs, with 92% functional capacity (the way a traditional RAID 5 configuration across 12 disks would work)? Nope.
vSAN RAID configurations have set numbers of data and parity blocks. As you can probably intuit from the 4 host requirement, a RAID 5 configuration creates 3 data blocks and 1 parity block, meaning that it writes data on 75% of the participating devices. That means that, for each stripe of data, each host will contain 1 block (either data or parity) on 1 of its SSDs. If that host (or that SSD) goes offline for longer than the allowed period, that data (or parity) will be recalculated on a remaining resource (if one is available). So, even if you have 20 hosts with 5 drives each, a given stripe of a RAID 5 redundant VMDK file is going to contain data blocks on 3 hosts and a parity block on 1 host.
RAID 1 and RAID 6 are handled in a similar fashion. RAID 6 stripes have 2 parity blocks and so use 4 data blocks and 2 parity blocks per stripe (requiring 6 ESXi hosts and providing 66% functional capacity).
RAID 1 is a full mirror copy, meaning that for every data block written, a second copy of that same data block is written somewhere else. In fact, RAID 1 can be configured to tolerate 2 failed devices, meaning that it will write 2 copies of every block. The functional capacity for these RAID 1 configurations are 50% and 33%, respectively.
So, that leaves the question of how all of this interacts with Fault Domains. A Fault Domain is a group of ESXi hosts that depend on the same hardware, such as the same Top of Rack switch or Power Distribution Unit or whatever. They're considered to be in the same Fault Domain because, should that hardware fail, the expectation is that all ESXi hosts in that Fault Domain will also fail.
By default, vSAN considers each ESXi host to be its own Fault Domain. So, when we say that a RAID 5 configuration requires 4 ESXi hosts, we actually mean that it requires 4 Fault Domains, so that vSAN can distribute the components of that stripe for proper redundancy. If you have 4 Fault Domains, with 3 ESXi hosts each, that RAID 5 stripe is going to select one host from each Fault Domain for the components of the stripe (and in fact, it will select one capacity tier device on the selected host in each Fault Domain).
So, the summary is that vSAN RAID != traditional RAID, however it does share some basic concepts (striping and redundancy). vSAN RAID levels have static numbers of members: RAID 5 is 3 + 1, RAID 6 is 4 + 2. The Fault Domain is the vSAN equivalent to a disk in a traditional RAID model, as those data blocks are spread across the Fault Domains, rather than across specific disks. Each host is its own Fault Domain by default, however administrators can group up hosts into specific Fault Domains based on upstream dependencies or even just for maintenance grouping.
First, the concept of RAID is a little different in vSAN than it is in a traditional array. Traditionally, RAID specifies the algorithm used to spread data (or parity data) across a set of disks. For example, RAID 5 specifies that data will be striped across all of the disks in a set, with a single disk's capacity used for parity. This means that a 3 disk RAID 5 set will store data on 66% of its disks' capacity. A 5 disk RAID 5 set will store data on 80% of its disks' capacity.
vSAN treats RAID differently. There are 3 different RAID types that vSAN supports: RAID 1, 5 and 6. Like in a traditional array, these RAID levels describe the data redundancy algorithm used, but the members are considered a little differently. Just like you'd expect, a vSAN RAID 5 configuration will have 1 parity block, but how many data blocks will it use? In order to support RAID 5, VMware requires at least 4 ESXi hosts and each of those hosts might have 3 or more capacity tier SSDs, so does this mean that a RAID 5 configuration is striped across all 12 SSDs, with 92% functional capacity (the way a traditional RAID 5 configuration across 12 disks would work)? Nope.
vSAN RAID configurations have set numbers of data and parity blocks. As you can probably intuit from the 4 host requirement, a RAID 5 configuration creates 3 data blocks and 1 parity block, meaning that it writes data on 75% of the participating devices. That means that, for each stripe of data, each host will contain 1 block (either data or parity) on 1 of its SSDs. If that host (or that SSD) goes offline for longer than the allowed period, that data (or parity) will be recalculated on a remaining resource (if one is available). So, even if you have 20 hosts with 5 drives each, a given stripe of a RAID 5 redundant VMDK file is going to contain data blocks on 3 hosts and a parity block on 1 host.
RAID 1 and RAID 6 are handled in a similar fashion. RAID 6 stripes have 2 parity blocks and so use 4 data blocks and 2 parity blocks per stripe (requiring 6 ESXi hosts and providing 66% functional capacity).
RAID 1 is a full mirror copy, meaning that for every data block written, a second copy of that same data block is written somewhere else. In fact, RAID 1 can be configured to tolerate 2 failed devices, meaning that it will write 2 copies of every block. The functional capacity for these RAID 1 configurations are 50% and 33%, respectively.
So, that leaves the question of how all of this interacts with Fault Domains. A Fault Domain is a group of ESXi hosts that depend on the same hardware, such as the same Top of Rack switch or Power Distribution Unit or whatever. They're considered to be in the same Fault Domain because, should that hardware fail, the expectation is that all ESXi hosts in that Fault Domain will also fail.
By default, vSAN considers each ESXi host to be its own Fault Domain. So, when we say that a RAID 5 configuration requires 4 ESXi hosts, we actually mean that it requires 4 Fault Domains, so that vSAN can distribute the components of that stripe for proper redundancy. If you have 4 Fault Domains, with 3 ESXi hosts each, that RAID 5 stripe is going to select one host from each Fault Domain for the components of the stripe (and in fact, it will select one capacity tier device on the selected host in each Fault Domain).
So, the summary is that vSAN RAID != traditional RAID, however it does share some basic concepts (striping and redundancy). vSAN RAID levels have static numbers of members: RAID 5 is 3 + 1, RAID 6 is 4 + 2. The Fault Domain is the vSAN equivalent to a disk in a traditional RAID model, as those data blocks are spread across the Fault Domains, rather than across specific disks. Each host is its own Fault Domain by default, however administrators can group up hosts into specific Fault Domains based on upstream dependencies or even just for maintenance grouping.
Comments
Post a Comment
Sorry guys, I've been getting a lot of spam recently, so I've had to turn on comment moderation. I'll do my best to moderate them swiftly after they're submitted,