Introduction
Choosing the right data storage solution can make or break your system’s performance and reliability. If you’re working in a Hyper-V environment, you’ve probably heard of StarWind Virtual SAN (VSAN) and Microsoft Storage Spaces Direct (S2D). But which one should you go for? In this article, we dive deep into these two solutions, comparing their performance, capacity efficiency, and practical application in a 2-node Hyper-V cluster setup. By the end, you’ll have a clearer picture of which solution might be your perfect match.
To compare these two solutions fairly, we set up a 2-node Hyperconverged Infrastructure (HCI) Hyper-V cluster under two different configurations:
StarWind VSAN NVMe-oF over TCP
- Host Mirroring + MDRAID-5.
Microsoft Storage Spaces Direct over TCP
- Nested mirror-accelerated parity, workload placed in the mirror tier.
- Nested mirror-accelerated parity, workload placed in both tiers – mirror and parity.
Solutions overview
StarWind VSAN for Hyper-V NVMe-oF over TCP scenario:
In this setup, each Hyper-V node is equipped with 5x NVMe drives passed through to the StarWind Controller Virtual Machine (CVM). Inside the CVM, the drives are assembled into an MDRAID5 array. On top of this array, two StarWind High Availability (HA) devices are created, ensuring data replication and continuous availability. StarWind NVMe-oF Initiator, chosen due to the lack of a native Microsoft NVMe-oF initiator (Microsoft is expected to introduce support for NVMe-oF in Windows Server 2025 but with TCP support only), connects these devices to the nodes. Cluster Shared Volumes (CSVs) are then created on these connected devices.
Microsoft Storage Spaces Direct over TCP scenario – Nested mirror-accelerated parity:
DISCLAIMER: We’re aware that disk witness isn’t officially supported with S2D. However, for the sake of our benchmarking and to speed up deployment, we chose to proceed with it. That said, do not use disk witness in your production S2D cluster.
This scenario tested two configurations of S2D, focusing on a Nested mirror-accelerated parity that provides the optimal balance between performance and capacity efficiency:
- Workload placed in the mirror tier: Maximizes performance by keeping data in the faster mirror tier.
- Workload placed in both tiers: Simulates a more balanced scenario where data moves between the mirror and parity tiers, reflecting real-world conditions (when the workload does not fit in the mirror tier and Resilient File System (ReFS) begins to move data to the parity tier). We also tried to achieve a behavior where writes were sent directly to the parity tier – the worst-case scenario.
In reality, with production workloads, the performance will likely fall somewhere between these two cases.
Two storage tiers are created with different resiliency settings – Mirror for performance and Parity for capacity – with the following parameters:
1 2 3 |
New-StorageTier -StoragePoolFriendlyName s2d-pool -FriendlyName NestedPerformance -ResiliencySettingName Mirror -MediaType SSD -NumberOfDataCopies 4 New-StorageTier -StoragePoolFriendlyName s2d-pool -FriendlyName NestedCapacity -ResiliencySettingName Parity -MediaType SSD -NumberOfDataCopies 2 -PhysicalDiskRedundancy 1 -NumberOfGroups 1 -FaultDomainAwareness StorageScaleUnit -ColumnIsolation PhysicalDisk -NumberOfColumns 4 |
Volumes are allocated with 20% in the mirror tier and 80% in the parity tier, adhering to Microsoft’s recommendations:
1 2 3 |
New-Volume -StoragePoolFriendlyName s2d-pool -FriendlyName Volume01 -StorageTierFriendlyNames NestedPerformance, NestedCapacity -StorageTierSizes 820GB, 3276GB New-Volume -StoragePoolFriendlyName s2d-pool -FriendlyName Volume02 -StorageTierFriendlyNames NestedPerformance, NestedCapacity -StorageTierSizes 820GB, 3276GB |
ReFS manages the data movement between these tiers to optimize performance. The threshold value, at which ReFS starts moving data between the tiers, was left at the default – 85%.
Capacity efficiency
Capacity efficiency is a big deal when evaluating storage solutions:
- StarWind VSAN for Hyper-V NVMe-oF
Achieves a capacity efficiency of 40%, thanks to its combination of host mirroring and MDRAID-5. - Microsoft S2D Nested mirror-accelerated parity
Delivers a capacity efficiency of 35.7% (20% mirror, 80% parity), though this can vary depending on the percentage of the volume allocated to the mirror tier. For more details on how to calculate capacity efficiency for Nested mirror-accelerated parity, please refer to the provided link.
Microsoft recommends leaving some capacity in the storage pool unallocated to give volumes space to repair “in-place” after drive failure. If sufficient capacity exists, an immediate, in-place, parallel repair can restore volumes to full resiliency even before the failed drives are replaced. This happens automatically. So, in our setup, the recommended reserve space is 5.82 TB (20% of the total pool size):
When planning your solution, consider these factors to ensure you get the best performance and efficiency for your needs.
Testbed overview
When it comes to evaluating storage solutions like StarWind VSAN for Hyper-V NVMe-oF over TCP and Microsoft S2D over TCP, we didn’t cut any corners. Our testbed was robust and meticulously configured to simulate real-world environments, ensuring our findings are relevant and reliable. Here’s a breakdown of the hardware and software setups that powered our tests:
Hardware:
Server model | Supermicro SYS-220U-TNR |
---|---|
CPU | Intel(R) Xeon(R) Platinum 8352Y @2.2GHz |
Sockets | 2 |
Cores/Threads | 64/128 |
RAM | 256GB |
NICs | 2x Mellanox ConnectX®-6 EN 200GbE (MCX613106A-VDA) |
Storage | 5x NVMe Micron 7450 MAX: U.3 3.2TB |
Software:
Windows Server | Windows Server 2022 Datacenter 21H2 OS build 20348.2527 |
---|---|
StarWind VSAN | Version V8 (build 15469, CVM 20240530) (kernel – 5.15.0-113-generic) |
StarWind NVMe-oF Initiator | StarWind NVMe-oF Initiator.2.0.0.672(rev 674).Setup.486 |
StarWind CVM parameters:
CPU | 24 vCPU |
---|---|
RAM | 32GB |
NICs | 1x network adapter for management 4x network adapter for client IO and synchronization |
Storage | MDRAID5 (5x NVMe Micron 7450 MAX: U.3 3.2TB) |
Testing methodology
The benchmarks were conducted using the FIO utility in the client/server mode. We configured a total of 20 virtual machines (VMs), with 10 VMs hosted on each server node. Each VM was allocated 4 vCPUs, 8GB of RAM, and three RAW virtual disks connected to separate SCSI controllers.
Test Scenarios:
Microsoft Storage Spaces Direct (S2D)
- Nested mirror-accelerated parity (Mirror-only): For scenarios where the workload is placed entirely in the mirror tier, each virtual disk size was 10GB.
- Nested mirror-accelerated parity (Both tiers): For scenarios utilizing both the mirror and parity tiers, each virtual disk size was 100GB.
StarWind VSAN NVMe-oF
- For all tests, each virtual disk size was 100GB.
Data Patterns Tested:
- 4k random read
- 4k random read/write (70/30)
- 4k random write
- 64k random read
- 64k random write
- 1M read
- 1M write
Pre-Test Warm-Up:
Before running specific tests, we filled virtual disks with random data and warmed up them using corresponding patterns to ensure stable performance:
- 4k random read/write (70/30) and 4k random write: VM disks were warmed up with a 4k random write pattern for 4 hours.
- 64k random write: VM disks were warmed up with a 64k random write pattern for 2 hours.
Test Execution:
- Duration: Read tests were conducted for 600 seconds, and write tests lasted 1800 seconds.
- Repetition: All tests were repeated three times, and the average value was used as the final result.
Specific Configurations:
- Microsoft Storage Spaces Direct (S2D)
Following Microsoft’s recommendations for the S2D scenario, test VMs were placed on the CSV owner node to avoid redirecting requests to another node, ensuring local data reads without using the network stack and providing less network utilization on writes. Each VHDX file was placed in different subdirectories to optimize ReFS metadata operations and reduce latency. - StarWind VSAN for Hyper-V NVMe-oF
VMs were evenly distributed across hosts without being pinned to the node that owns the volume. Each VHDX file was placed in different subdirectories to maintain consistent performance.
Benchmarking local NVMe performance
Before diving into the full evaluation, we checked if the NVMe drives lived up to their vendor’s promises, so we ran a series of tests to see if its performance matched up. Here is the image with vendor-claimed performance:
Using the FIO utility in client/server mode, we checked how well the NVMe SSDs in our server performed in a local storage setup. Our local storage tests used different patterns to see how the NVMe SSDs handled different kinds of data. The following results have been achieved:
1x NVMe Micron 7450 MAX: U.3 3.2TB | |||||
---|---|---|---|---|---|
Pattern | Numjobs | IOdepth | IOPs | MiB\s | Latency (ms) |
4k random read | 6 | 32 | 997,000 | 3,894 | 0.192 |
4k random read/write 70/30 | 6 | 16 | 531,000 | 2,073 | 0.142 |
4k random write | 4 | 4 | 385,000 | 1,505 | 0.041 |
64k random read | 8 | 8 | 92,900 | 5,807 | 0.688 |
64k random write | 2 | 1 | 27,600 | 1,724 | 0.072 |
1M read | 1 | 8 | 6,663 | 6,663 | 1.200 |
1M write | 1 | 2 | 5,134 | 5,134 | 0.389 |
Our tests showed that the NVMe drives lived up to what the vendor promised. Whether handling small 4k reads or large 1M writes, they delivered on speed and consistency.
Benchmark results in a table
The benchmarking results are presented in tables to illustrate performance metrics such as IOPS, throughput (MiB/s), latency (ms), and CPU usage. An additional metric, “IOPS per 1% CPU usage,” highlights the performance dependency on the CPU usage for 4k random read/write patterns. This parameter is calculated using the following formula:
IOPS per 1% CPU usage = IOPS / Node count / Node CPU usage
Where:
- IOPS represents the number of I/O operations per second for each pattern.
- Node count is 2 nodes in our case.
- Node CPU usage denotes the CPU usage of one node during the test.
By incorporating this additional metric, we aimed to provide deeper insights into how CPU usage correlates with IOPS, offering a more nuanced understanding of performance characteristics.
Now let’s delve into the detailed benchmark results for each storage configuration.
StarWind VSAN for Hyper-V NVMe-oF over TCP scenario
The table illustrates StarWind VSAN’s performance under various workload patterns and configurations. For 4k random reads, IOPS scored from 420,000 at lower queue depths to 881,000 at higher depths. In a mixed 4k random read/write (70/30) test, it achieves up to 561,000 IOPS, showcasing its prowess in handling mixed workloads.
In the 64k and 1M read/write patterns, the StarWind VSAN NVMe-oF reaches up to 15,2 GB/s, demonstrating its ability to handle such workloads effectively.
VM count | Pattern | Numjobs | IOdepth | IOPs | MiB/s | Latency (ms) | Node CPU usage % | IOPs per 1% CPU usage |
---|---|---|---|---|---|---|---|---|
20 | 4k random read | 3 | 4 | 420,000 | 1,641 | 0.570 | 45.00% | 4,667 |
4k random read | 3 | 8 | 307,000 | 1,200 | 1.515 | 35.00% | 4,386 | |
4k random read | 3 | 16 | 546,000 | 2,134 | 1.736 | 50.00% | 5,460 | |
4k random read | 3 | 32 | 741,000 | 2,895 | 2.586 | 57.00% | 6,500 | |
4k random read | 3 | 64 | 836,000 | 3,265 | 4.567 | 58.00% | 7,207 | |
4k random read | 3 | 128 | 881,000 | 3,442 | 8.827 | 60.00% | 7,342 | |
4k random read/write (70%/30%) | 3 | 2 | 241,500 | 943 | 0.582 | 39.00% | 3,096 | |
4k random read/write (70%/30%) | 3 | 4 | 334,000 | 1,305 | 0.843 | 45.00% | 3,711 | |
4k random read/write (70%/30%) | 3 | 8 | 301,200 | 1,177 | 1.683 | 42.00% | 3,586 | |
4k random read/write (70%/30%) | 3 | 16 | 416,000 | 1,625 | 2.507 | 48.00% | 4,333 | |
4k random read/write (70%/30%) | 3 | 32 | 534,000 | 2,086 | 4.002 | 53.00% | 5,038 | |
4k random read/write (70%/30%) | 3 | 64 | 561,000 | 2,191 | 7.768 | 52.00% | 5,394 | |
4k random write | 3 | 2 | 139,000 | 541 | 0.859 | 33.00% | 2,106 | |
4k random write | 3 | 4 | 192,000 | 751 | 1.246 | 39.00% | 2,462 | |
4k random write | 3 | 8 | 238,000 | 928 | 2.018 | 44.00% | 2,705 | |
4k random write | 3 | 16 | 260,000 | 1,015 | 3.689 | 44.00% | 2,955 | |
4k random write | 3 | 32 | 167,000 | 653 | 11.476 | 27.00% | 3,093 | |
64k random read | 3 | 2 | 160,000 | 10,000 | 0.749 | 35.00% | ||
64k random read | 3 | 4 | 200,000 | 12,500 | 1.205 | 39.00% | ||
64k random read | 3 | 8 | 210,000 | 13,125 | 2.299 | 40.00% | ||
64k random read | 3 | 16 | 228,000 | 14,250 | 4.203 | 41.00% | ||
64k random read | 3 | 32 | 233,000 | 14,562 | 8.343 | 41.00% | ||
64k random write | 3 | 1 | 44,000 | 2,751 | 1.350 | 25.00% | ||
64k random write | 3 | 2 | 51,900 | 3,242 | 2.311 | 27.00% | ||
64k random write | 3 | 4 | 58,300 | 3,645 | 4.108 | 28.00% | ||
64k random write | 3 | 8 | 62,400 | 3,900 | 7.689 | 29.00% | ||
64k random write | 3 | 16 | 63,600 | 3,975 | 12.070 | 29.00% | ||
64k random write | 3 | 32 | 63,800 | 3,987 | 30.150 | 29.00% | ||
1024k read | 1 | 1 | 10,000 | 10,000 | 1.998 | 26.00% | ||
1024k read | 1 | 2 | 12,400 | 12,400 | 3.225 | 29.00% | ||
1024k read | 1 | 4 | 14,100 | 14,100 | 5.668 | 31.00% | ||
1024k read | 1 | 8 | 15,200 | 15,200 | 10.574 | 32.00% | ||
1024k read | 1 | 16 | 15,600 | 15,600 | 20.625 | 33.00% | ||
1024k write | 1 | 1 | 3,443 | 3,443 | 5.804 | 24.00% | ||
1024k write | 1 | 2 | 3,903 | 3,903 | 10.241 | 25.00% | ||
1024k write | 1 | 4 | 4,086 | 4,086 | 19.561 | 25.00% | ||
1024k write | 1 | 8 | 4,156 | 4,156 | 38.492 | 25.00% |
Overall, StarWind VSAN shows great performance at 4k random read/write patterns, consistent read and write performance regardless of VM location, and impressive capacity efficiency at 40%.
Microsoft Storage Spaces Direct over TCP scenario (Nested mirror-accelerated parity: Mirror-only)
The next table presents S2D’s performance with a Nested mirror-accelerated parity configuration, focusing on workloads in the mirror tier.
For 4k random read scenarios, IOPS peak at 2,653,000, showcasing exceptional read performance due to local reading. In the 4k random read/write (70/30) pattern, results reach up to 654,000 IOPS.
The 64k random read/write and 1M read/write tests maintain high throughput, with 53,5 GB/s for 64k reads and 52,4GB/s for 1M reads. S2D shows exceptional read performance when VMs are on the volume-owning node, and robust write performance within the mirror tier. However, read performance declines if local reading requirements aren’t met and unusual performance drops occur at certain queue depths. Additionally, write performance can drop if VMs are running on a node that is not the volume owner, necessitating careful monitoring.
VM count | Pattern | Numjobs | IOdepth | IOPs | MiB/s | Latency (ms) | Node CPU usage % | IOPs per 1% CPU usage |
---|---|---|---|---|---|---|---|---|
20 | 4k random read | 3 | 4 | 833,000 | 3,256 | 0.286 | 27.00% | 15,426 |
4k random read | 3 | 8 | 752,000 | 2,937 | 0.648 | 21.00% | 17,905 | |
4k random read | 3 | 16 | 1,083,000 | 4,230 | 0.884 | 29.00% | 18,672 | |
4k random read | 3 | 32 | 1,646,000 | 6,429 | 1.165 | 41.00% | 20,073 | |
4k random read | 3 | 64 | 2,344,000 | 9,158 | 1.637 | 54.00% | 21,704 | |
4k random read | 3 | 128 | 2,653,000 | 10,363 | 2.897 | 67.00% | 19,799 | |
4k random read/write (70%/30%) | 3 | 2 | 324,300 | 1,266 | 0.382 | 20.00% | 8,108 | |
4k random read/write (70%/30%) | 3 | 4 | 114,600 | 447 | 2.103 | 7.00% | 8,186 | |
4k random read/write (70%/30%) | 3 | 8 | 62,800 | 245 | 7.659 | 4.00% | 7,850 | |
4k random read/write (70%/30%) | 3 | 16 | 509,000 | 1,988 | 1.939 | 25.00% | 10,180 | |
4k random read/write (70%/30%) | 3 | 32 | 614,000 | 2,398 | 3.564 | 31.00% | 9,903 | |
4k random read/write (70%/30%) | 3 | 64 | 654,000 | 2,554 | 6.899 | 34.00% | 9,618 | |
4k random write | 3 | 2 | 80,300 | 314 | 1.499 | 9.00% | 4,461 | |
4k random write | 3 | 4 | 46,800 | 183 | 5.116 | 6.00% | 3,900 | |
4k random write | 3 | 8 | 34,800 | 136 | 13.788 | 4.00% | 4,350 | |
4k random write | 3 | 16 | 64,700 | 253 | 14.876 | 7.00% | 4,621 | |
4k random write | 3 | 32 | 186,000 | 728 | 10.174 | 18.00% | 5,167 | |
64k random read | 3 | 2 | 317,000 | 19,812 | 0.376 | 17.00% | ||
64k random read | 3 | 4 | 498,000 | 31,125 | 0.478 | 25.00% | ||
64k random read | 3 | 8 | 424,000 | 26,500 | 1.142 | 22.00% | ||
64k random read | 3 | 16 | 623,000 | 38,937 | 1.539 | 27.00% | ||
64k random read | 3 | 32 | 856,000 | 53,500 | 2.243 | 38.00% | ||
64k random write | 3 | 1 | 85,700 | 5,355 | 0.693 | 14.00% | ||
64k random write | 3 | 2 | 58,300 | 3,645 | 2.055 | 10.00% | ||
64k random write | 3 | 4 | 32,300 | 2,019 | 7.435 | 5.00% | ||
64k random write | 3 | 8 | 23,300 | 1,457 | 20.592 | 4.00% | ||
64k random write | 3 | 16 | 41,800 | 2,616 | 22.939 | 6.00% | ||
64k random write | 3 | 32 | 86,300 | 5,393 | 22.138 | 14.00% | ||
1024k read | 1 | 1 | 19,900 | 19,900 | 1.002 | 5.00% | ||
1024k read | 1 | 2 | 31,600 | 31,600 | 1.267 | 7.00% | ||
1024k read | 1 | 4 | 43,700 | 43,700 | 1.825 | 11.00% | ||
1024k read | 1 | 8 | 50,300 | 50,300 | 3.180 | 14.00% | ||
1024k read | 1 | 16 | 52,400 | 52,400 | 6.098 | 16.00% | ||
1024k write | 1 | 1 | 8,290 | 8,290 | 2.400 | 8.00% | ||
1024k write | 1 | 2 | 8,693 | 8,693 | 4.614 | 9.00% | ||
1024k write | 1 | 4 | 8,607 | 8,607 | 9.290 | 9.00% | ||
1024k write | 1 | 8 | 8,559 | 8,559 | 18.684 | 9.00% |
Microsoft Storage Spaces Direct over TCP scenario (Nested mirror-accelerated parity: Both Tiers)
The performance metrics for the dual-tier configuration in S2D highlight workload management across both mirror and parity tiers.
In 4k random read patterns, IOPS reach up to 2,500,000, showcasing excellent scalability. The 4k random read/write (70/30) pattern results show up to 247,000 IOPS.
For 64k random read/write and 1M read/write tests, the system maintains strong throughput, with 52,000 MiB/s for 64k reads and 50,100 MiB/s for 1M reads, demonstrating S2D’s robust capability to handle complex data operations across tiers. However, its write performance drops when workloads exceed the mirror tier.
VM count | Pattern | Numjobs | IOdepth | IOPs | MiB/s | Latency (ms) | Node CPU usage % | IOPs per 1% CPU usage |
---|---|---|---|---|---|---|---|---|
20 | 4k random read | 3 | 4 | 814,000 | 3,179 | 0.293 | 27.00% | 15,074 |
4k random read | 3 | 8 | 739,000 | 2,886 | 0.642 | 26.00% | 14,212 | |
4k random read | 3 | 16 | 1,003,000 | 3,918 | 0.956 | 29.00% | 17,293 | |
4k random read | 3 | 32 | 1,556,000 | 6,078 | 1.232 | 42.00% | 18,524 | |
4k random read | 3 | 64 | 2,190,000 | 8,554 | 1.749 | 55.00% | 19,909 | |
4k random read | 3 | 128 | 2,500,000 | 9,766 | 3.068 | 68.00% | 18,382 | |
4k random read/write (70%/30%) | 3 | 2 | 126,200 | 492 | 1.245 | 27.00% | 2,337 | |
4k random read/write (70%/30%) | 3 | 4 | 108,200 | 422 | 2.442 | 23.00% | 2,352 | |
4k random read/write (70%/30%) | 3 | 8 | 49,800 | 195 | 9.766 | 10.00% | 2,490 | |
4k random read/write (70%/30%) | 3 | 16 | 225,800 | 882 | 5.690 | 37.00% | 3,051 | |
4k random read/write (70%/30%) | 3 | 32 | 247,000 | 965 | 11.251 | 38.00% | 3,250 | |
4k random read/write (70%/30%) | 3 | 64 | 231,400 | 903 | 25.634 | 33.00% | 3,506 | |
4k random write | 3 | 2 | 51,400 | 201 | 2.324 | 24.00% | 1,071 | |
4k random write | 3 | 4 | 58,600 | 229 | 4.094 | 26.00% | 1,127 | |
4k random write | 3 | 8 | 58,600 | 229 | 8.170 | 26.00% | 1,127 | |
4k random write | 3 | 16 | 74,800 | 292 | 12.868 | 29.00% | 1,290 | |
4k random write | 3 | 32 | 74,900 | 293 | 25.659 | 29.00% | 1,291 | |
64k random read | 3 | 2 | 316,000 | 19,750 | 0.378 | 18.00% | ||
64k random read | 3 | 4 | 488,000 | 30,500 | 0.490 | 26.00% | ||
64k random read | 3 | 8 | 377,000 | 23,560 | 1.296 | 22.00% | ||
64k random read | 3 | 16 | 601,000 | 37,562 | 1.596 | 27.00% | ||
64k random read | 3 | 32 | 832,000 | 52,000 | 2.307 | 38.00% | ||
64k random write | 3 | 1 | 14,700 | 919 | 4.078 | 13.00% | ||
64k random write | 3 | 2 | 15,200 | 950 | 7.883 | 17.00% | ||
64k random write | 3 | 4 | 14,400 | 900 | 16.656 | 17.00% | ||
64k random write | 3 | 8 | 14,600 | 913 | 32.938 | 17.00% | ||
64k random write | 3 | 16 | 14,700 | 919 | 65.238 | 18.00% | ||
64k random write | 3 | 32 | 14,400 | 900 | 132.694 | 18.00% | ||
1024k read | 1 | 1 | 19,900 | 19,900 | 1.002 | 5.00% | ||
1024k read | 1 | 2 | 31,600 | 31,600 | 1.230 | 8.00% | ||
1024k read | 1 | 4 | 42,400 | 42,400 | 1.882 | 11.00% | ||
1024k read | 1 | 8 | 47,600 | 47,600 | 3.363 | 13.00% | ||
1024k read | 1 | 16 | 50,100 | 50,100 | 6.379 | 16.00% | ||
1024k write | 1 | 1 | 1,482 | 1,482 | 13.496 | 4.00% | ||
1024k write | 1 | 2 | 1,573 | 1,573 | 25.448 | 4.00% | ||
1024k write | 1 | 4 | 2,295 | 2,295 | 34.817 | 5.00% | ||
1024k write | 1 | 8 | 2,187 | 2,187 | 73.178 | 5.00% |
Overall, S2D shows exceptional performance in both test cases, however the storage capacity efficiency is about 35.7% and could be even less if additional space is assigned for in-place repairs.
Benchmarking results in graphs
With all benchmarks completed and data collected, we can now compare the results using graphical charts for a clearer understanding.
4k random read:
Let’s start with the 4K random read test, where Figure 1 showcases the performance in IOPS. The S2D in the Nested mirror-accelerated parity configuration with workload in the mirror tier reaches a remarkable 833,000 IOPS at 4 IO depth, scaling up to 2,653,000 IOPS at 128 IO depth.
Comparatively, the StarWind VSAN NVMe-oF HA scenario peaks at 881,000 IOPS at 128 IO depth. Here, S2D outshines StarWind VSAN with approximately 200% more IOPS at higher depths.
So, what’s the magic behind S2D’s performance boost? It’s all about local reading. In a cluster shared volume (CSV) setup, S2D leverages the SMB 3.0 protocol to allow multiple hosts to access and perform I/O operations on a shared volume (if you want to explore this topic in more detail, please read here or check this article). If a VM is running on the node that owns the volume, it can read data directly from the local disk, bypassing the network stack. This local read path minimizes latency and maximizes performance, leading to impressive IOPS numbers.
However, there’s a catch. This local reading perk only works if the VM is on the volume-owning node. If not, the read operations have to go through the network to the owning node, which can slow things down. To keep things running smoothly, you need to keep an eye on where your VMs are running and move them to the appropriate nodes as necessary.
When it comes to random read latency, Figure 2 reveals that Storage Spaces Direct with the workload within the mirror tier also excels, starting at a low 0.286 ms at 4 IO depth and increasing to 2.897 ms at 128 IO depth. StarWind VSAN NVMe-oF begins at 0.570 ms, reaching 8.827 ms at the same depth.
Even with the workload split between the mirror and parity tiers, S2D maintains superior read latency, starting at 0.293 ms and peaking at 3.068 ms. The latency advantage of S2D is again attributed to local reads.
Switching gears to efficiency, Figure 3 compares IOPS per 1% CPU usage during a 4k random read test. Storage Spaces Direct with the workload in mirror tier proves highly efficient, delivering up to 21,704 IOPS per 1% CPU usage at 64 IO depth, whereas StarWind VSAN NVMe-oF peaks at 7,342 IOPS per 1% CPU usage at 128 IO depth. This makes S2D approximately 196% more efficient.
Even when the workload spans both S2D tiers, it maintains a strong efficiency advantage, reaching 19,909 IOPS per 1% CPU usage at 64 IO depth.
4k random read/write 70/30:
Next, let’s dive into mixed 70/30 read-write patterns. Figure 4 is key for understanding real-world performance because pure read or write workloads are rare in actual production.
Figure 4 shows the number of IOPS during the mixed 70%/30% 4k random read/write tests with Numjobs = 3.
Interestingly, with Storage Spaces Direct, there’s a noticeable drop in performance at queue depths 4 and 8. This performance drop is not observed in StarWind VSAN tests. StarWind maintains consistent performance, hitting 334,000 IOPS at queue depth 4 and 301,200 IOPS at queue depth 8.
In contrast, with the workload in the mirror tier, S2D’s performance drops to 114,600 IOPS at queue depth 4 and 62,800 IOPS at queue depth 8, representing reductions of approximately 65.7% and 79.1%, respectively, compared to StarWind VSAN. Fortunately, S2D shows a significant rebound in performance starting at QD=16, ultimately scoring about 25% higher than StarWind VSAN under the same conditions.
However, when the workload is distributed across both mirror and parity tiers, S2D struggles due to ReFS continuously moving new data from the mirror tier to the parity tier, which negatively impacts performance. As a result, S2D records 126,200 IOPS at queue depth 2, drops to a low of 49,800 IOPS at QD=8, and then peaks at 247,000 IOPS at queue depth 32.
Meanwhile, StarWind VSAN outperforms S2D, achieving 241,500 IOPS at queue depth 2 and 534,000 IOPS at queue depth 32, highlighting its superior performance with IOPS figures that are 91.4% higher at queue depth 2 and 116.2% higher at queue depth 32 compared to S2D in the “dual-tier” scenario.
Figure 5 examines latency for the mixed 4K random 70/30 workload. Storage Spaces Direct with the workload in mirror tier starts at 0.382 ms at 2 IO depth, reaching 6.899 ms at 64 IO depth. StarWind VSAN NVMe-oF, on the other hand, starts at 0.582 ms and goes up to 7.768 ms.
Figure 6 explores the number of IOPS relative to 1% CPU utilization during the same mixed workload.
Storage Spaces Direct with workload within the mirror tier provides up to 10,180 IOPS per 1% CPU usage at 16 IO depth, while StarWind VSAN NVMe-oF peaks at 5,394 IOPS per 1% CPU usage. This makes S2D about 89% more efficient.
When workload is touching both tiers, S2D achieves a maximum of 3,506 IOPS per 1% CPU usage, demonstrating 69.7% less performance compared to the StarWind VSAN NVMe-oF HA scenario at 64 IO depth.
4k random write:
The ability to maintain consistent write performance across various queue depths is crucial for demanding virtualization environments. Figure 7 shows the amount of IOPS during 4k random write operations.
StarWind VSAN stands out with consistent performance across most queue depths, significantly outperforming S2D (workload in mirror tier) from IO depth 2 to 16.
With S2D and workload in the mirror tier, there’s a big drop in performance at queue depths 4 and 8. At queue depth 4, Storage Spaces Direct score about 46,800 IOPS which is more than 4 times lower than StarWind VSAN’s 192,000 IOPS figure. At queue depth 8, the gap widens even more and StarWind VSAN ends up being 584% more effective. This result is unexpected, as we anticipated better performance from S2D in this test compared to when the workload spans both mirror and parity tiers.
Interestingly, at queue depth 32 StarWind VSAN loses the advantage scoring 167,000 IOPS, while S2D with data in mirror tier gets traction achieving 186,000 IOPS. That’s being said, when workload is hitting both tiers, S2D is unable to show better performance figures and ends up scoring 74,900 IOPS.
Latency for 4K random writes is also a critical factor. Since latency corresponds to prior IOPS results, the overall picture remains consistent in Figure 8.
StarWind VSAN NVMe-oF demonstrates the lowest latency, starting at 0.859 ms with a 2 IO depth and increasing to 3.689 ms at a 16 IO depth. Virtual SAN significantly outperforms Storage Spaces Direct with workload in mirror tier, which starts at 1.499 ms (43% higher latency) and rises to 14.876 ms (75% higher latency) in QD=16 test.
When comparing StarWind to S2D with workload within both tiers, the performance gap is even more pronounced, with StarWind showing 63% lower latency at 2 IO depth (0.859 ms vs. 2.324 ms) and 55% lower latency at 32 IO depth (11.476 ms vs. 25.659 ms). Only at 32 IO depth, StarWind VSAN NVMe-oF demonstrates slightly higher latency, reaching 11.476 ms compared to S2D’s 10.174 ms, which is about 11% lower.
Efficiency in 4K random write workloads is measured in IOPS per 1% CPU usage, as shown in Figure 9.
Storage Spaces Direct with workload in mirror tier achieves up to 5,167 IOPS per 1% CPU usage, while StarWind VSAN NVMe-oF peaks at 3,093 IOPS per 1% CPU usage, making S2D approximately 67% more efficient. However, when the workload utilizes both S2D tiers, efficiency drops significantly, with a maximum of only 1,291 IOPS per 1% CPU usage, making it the least efficient of the three scenarios.
64k random read:
Moving to larger data blocks, Figure 10 illustrates the throughput performance for 64K random reads.
Storage Spaces Direct with workload in mirror tier significantly outpaces the StarWind VSAN NVMe-oF HA scenario, achieving a peak of 53,500 MiB/s at 32 IO depth compared to StarWind’s 14,562 MiB/s. This indicates that S2D delivers approximately 267% more throughput.
When the workload utilizes both S2D tiers, it shows slightly lower throughput but still surpasses StarWind VSAN significantly. The higher performance in S2D is attributed to local reads, but remember, this efficiency is conditional on the VM running on the node that owns the volume. StarWind VSAN, in contrast, provides stable performance regardless of VM placement, eliminating the need for additional monitoring and VM binding.
Figure 11 shows the latency for 64K random reads. The results align with the throughput data discussed earlier.
Here, S2D with workload in the mirror tier maintains low latency due to local reads, starting at 0.376 ms and reaching 2.243 ms at 32 IO depth. The StarWind VSAN NVMe-oF scenario starts higher at 0.749 ms and peaks at 8.343 ms, which is up to 73% higher latency than S2D.
In Figure 12, we examine CPU usage during 64K random reads.
Storage Spaces Ditrect with workload in the mirror tier starts at 17% CPU usage at 2 IO depth and peaks at 38% at 32 IO depth. When the workload hits both tiers, S2D shows consistent CPU usage trends, closely following the S2D “mirror-only” test results.
The StarWind VSAN NVMe-oF scenario begins significantly higher at 35% and peaks at 41%, slightly above S2D. This indicates that S2D is more efficient in CPU usage, with StarWind VSAN using approximately 106% more CPU at IO depths 2 to 16 and about 8% more at IOdepth=32.
64k random write:
Figure 13 illustrates the 64K random write throughput, highlighting performance differences across three scenarios.
Storage Spaces Direct with workload in the mirror tier exhibits erratic performance, with notable drops at medium IO depths. For example, throughput falls to 2,019 MiB/s at a 4 IO depth, dips further to 1,457 MiB/s at 8 IO depth, and then rebounds to 2,616 MiB/s at 16 IO depth. This pattern mirrors the behavior observed in 4K random write tests.
In contrast, StarWind VSAN delivers more consistent performance, surpassing S2D by 79.6% at a 4 IO depth, by 167.6% at 8 IO depth, and by 52% at 16 IO depth.
When workloads span both tiers, S2D shows significantly lower throughput across the board. StarWind VSAN outperforms S2D by 199% at a 1 IO depth, with the performance gap widening to 343% at a 32 IO depth.
This highlights StarWind’s capability to handle write operations with consistently high performance across varying IO depths.
Figure 14 displays the latency for 64K random writes, showing a similar trend.
StarWind VSAN delivers faster response times at lower IO depths (4, 8, and 16), but S2D (with workloads in the mirror tier) takes the lead at a 32 IO depth, achieving a lower latency of 22.138 ms compared to StarWind’s 30.150 ms.
When compared to S2D’s configuration with workloads spread across both tiers, StarWind VSAN is significantly more efficient, providing 66.9% faster response times at a 1 IO depth and 77.3% lower latency at a 32 IO depth.
Figure 15 highlights CPU usage during 64K random writes.
StarWind VSAN consistently shows higher CPU utilization compared to both Storage Spaces Direct configurations. At a 1 IO depth, StarWind uses 25% CPU, which is 79% higher than S2D’s 14% with the workload in the mirror tier and 92% higher than S2D’s 13% in the mixed-tier test. This trend persists across different IO depths, with StarWind maintaining higher CPU usage but delivering more consistent performance under varying workloads.
1M read:
Figure 16 presents the throughput results for 1024K reads, where S2D with workload in the mirror tier significantly outperforms StarWind VSAN, reaching 52,000 MiB/s at a 16 IO depth compared to StarWind’s 15,600 MiB/s — about 233% higher throughput.
Even when workloads are spread across both tiers, S2D continues to outperform StarWind VSAN by a substantial margin. This impressive read performance from S2D is again due to local reads when the test VM is located on the CSV owner node.
Figure 17 shows the latency results during the 1024K read test, reflecting a pattern similar to the throughput results.
S2D with workloads in the mirror tier demonstrates impressively low latency, benefiting from local reads. Latency starts at 1.002 ms and increases to 6.098 ms as IO depth grows.
In contrast, StarWind VSAN starts at 1.998 ms and peaks at 20.625 ms, resulting in S2D delivering up to 240% lower latency than StarWind.
When workloads are distributed across both S2D tiers, latency remains nearly identical to that of the mirror tier tests.
Figure 18 highlights CPU usage during 1024K reads, where S2D demonstrates significantly lower resource consumption compared to StarWind VSAN.
With workloads in the mirror tier, S2D starts at 5% CPU usage at a 1 IO depth and increases to 16% at a 16 IO depth.
In contrast, StarWind VSAN begins at 26% and rises to 33%, meaning S2D uses about 63% less CPU on average.
Even when workloads span both tiers, S2D maintains the same CPU usage levels as in the mirror-only benchmarks. S2D’s efficiency in local reads translates to more effective CPU usage, while StarWind requires more resources to sustain consistent performance.
1M write:
When we shift our focus to 1024K sequential write throughput, Figure 19 reveals that Storage Spaces Direct (S2D) with a workload in the mirror tier holds a significant performance advantage over StarWind VSAN. Specifically, S2D reaches 8,693 MiB/s at a 2 IO depth, while StarWind VSAN manages 3,903 MiB/s. At an 8 IO depth, S2D continues to dominate, hitting 8,559 MiB/s, compared to StarWind’s 4,156 MiB/s. This means S2D delivers approximately 122% higher throughput—if your workload is optimized for the mirror tier.
However, it’s important to note that this advantage is conditional. If the workload doesn’t fit into the mirror tier, S2D’s performance drops dramatically. This is evident in the multi-tiered test results, where StarWind VSAN outperforms Storage Spaces Direct by about 106% on average.
Diving into the 1024K write latency as depicted in Figure 20, we see a consistent theme.
With its workload in the mirror tier, Storage Spaces Direct begins at a brisk 2.400 ms and climbs to 18.684 ms at 8 IO depth. In comparison, StarWind VSAN starts at a slower 5.804 ms and escalates to a higher peak of 38.492 ms, which demonstrates that S2D provides up to 106% lower latency.
However, when the workload spans both tiers, the scenario shifts. S2D records the highest latencies, starting at 13.496 ms and surging to 73.178 ms at 8 IO depth. This again indicates a significant performance shift depending on how well the workload is aligned with S2D’s optimal tier configuration.
Figure 21 highlights CPU usage during 1024K writes. When running the workload in the mirror tier, Storage Spaces Direct (S2D) starts with 8% CPU usage at 1 IO depth and consistently holds at 9% across 2, 4, and 8 IO depths.
In contrast, StarWind VSAN begins at a much higher 24% and remains steady around 25% across all IO depths. This indicates that S2D consumes approximately 72% less CPU on average, demonstrating significantly more efficient resource utilization compared to StarWind VSAN.
When the workload spans both tiers of S2D, it continues to exhibit even lower CPU usage, starting at just 4% at 1 and 2 IO depths and modestly rising to 5% at 4 and 8 IO depths.
Additional benchmarking: 1 VM, 1 numjobs, 1 iodepth.
To gain a deeper understanding of how StarWind VSAN and Storage Spaces Direct perform under specific synthetic conditions, we conducted additional benchmarks focusing on a single VM scenario, with numjobs = 1 and an IO depth of 1. Typically, this is the best way to measure storage access latency in the best possible scenario.
Benchmark results in a table
StarWind VSAN NVMe-oF HA (TCP) – Host mirroring + MDRAID5 (1 VM) | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
Pattern | Numjobs | IOdepth | IOPs | MiB\s | Latency (ms) | |||||
4k random read | 1 | 1 | 1,112 | 4 | 0.897 | |||||
4k random write | 1 | 1 | 501 | 2 | 1.991 | |||||
4k random write (synchronous) | 1 | 1 | 226 | 1 | 4.415 | |||||
Storage Spaces Direct (TCP) – Nested mirror accelerated parity – Data in mirror tier (1 VM) | ||||||||||
Pattern | Numjobs | IOdepth | IOPs | MiB\s | Latency (ms) | |||||
4k random read | 1 | 1 | 7,221 | 28 | 0.137 | |||||
4k random write | 1 | 1 | 5,456 | 21 | 0.182 | |||||
4k random write (synchronous) | 1 | 1 | 2,887 | 11 | 0.344 |
Storage Spaces Direct (TCP) – Nested mirror accelerated parity – Data in mirror and parity tiers (1 VM) | |||||
---|---|---|---|---|---|
Pattern | Numjobs | IOdepth | IOPs | MiB\s | Latency (ms) |
4k random read | 1 | 1 | 5,920 | 23 | 0.167 |
4k random write | 1 | 1 | 2,517 | 10 | 0.395 |
4k random write (synchronous) | 1 | 1 | 1,772 | 7 | 0.562 |
Benchmark results in graphs
This section presents visual comparisons of the performance and latency metrics across storage configurations under research.
4k random read:
Figure 1 demonstrates IOPS for the 4K random read test at 1 IO depth and with one numjobs. S2D with the workload in mirror tier outperforms the StarWind VSAN NVMe-oF HA scenario, delivering 7,221 IOPS.
This remarkable 550% increase over StarWind’s 1,112 IOPS is primarily due to S2D’s ability to leverage local reads and operate at the host level. In contrast, StarWind VSAN, running inside a VM, encounters a much longer datapath, which negatively impacts its performance.
Even when S2D operates with data across both mirror and parity tiers, it maintains strong performance at 5,920 IOPS, still surpassing StarWind by 432%.
Latency metrics for the 4K random read test at 1 IO depth, as shown in Figure 2, similarly favor Storage Spaces Direct with the workload in mirror tier, which records a swift 0.137 ms. This is a substantial 553% lower latency compared to StarWind VSAN’s 0.897 ms. This advantage is again due to S2D’s local read capabilities and direct host-level operation. Even in a mixed-tier setup, S2D maintains its lead with a latency of 0.167 ms, still outperforming StarWind by 437%.
Figure 3 showcases the results of the 4K random write test at IO depth=1 with a numjob=1. Storage Spaces Direct with the workload in the mirror tier achieves a remarkable 5,456 IOPS — an astounding 990% higher than StarWind VSAN’s 501 IOPS. This significant advantage stems from S2D’s ability to write directly to the mirror tier, bypassing the resource-intensive parity calculation.
However, when S2D handles workloads across both mirror and parity tiers, performance drops to 2,517 IOPS due to the additional overhead of invalidating data in the parity tier. For a deeper dive into how reading and writing function in a Nested mirror-accelerated parity scenario, please refer to the detailed explanation provided here.
On the other hand, StarWind VSAN, which writes directly to the MDRAID5 array, experiences performance degradation due to the read-modify-write (RMW) operations and the extended IO datapath inherent in its VM-based operation. Despite these technical challenges, the performance of StarWind VSAN at QD=1 appears unusually low, prompting us to initiate an investigation into this issue to uncover the underlying cause.
Moving on to Figure 4, we examine the latency metrics for 4K random writes.
No surprises here. Storage Spaces Direct (S2D) continues to deliver superior performance, benefiting from its efficient data handling within the mirror tier and achieving an impressively low latency of 0.182 ms. This represents a staggering 995% improvement over StarWind VSAN’s 1.991 ms.
Even when S2D operates across both mirror and parity tiers, it maintains a competitive latency of 0.395 ms, still outperforming StarWind by 404%.
In our synchronous 4K RW single-threaded IO tests, as shown in Figure 5, Storage Spaces Direct with the dataset in the mirror tier once again takes the lead, achieving 2,887 IOPS — a staggering 1,177% increase over StarWind VSAN’s 226 IOPS.
This significant performance boost is attributed to the same factors observed in asynchronous 4K random write test, where S2D benefits from direct writes to the mirror tier, effectively bypassing the resource-heavy parity calculations.
Even in the “mixed tiers” setup, S2D maintains a strong advantage, delivering 1,772 IOPS and still outpacing StarWind by 684%.
Figure 6 highlights the latency results for synchronous 4K RW single-threaded IO, further confirming S2D’s performance edge.
With the workload within the mirror tear, Storage Spaces Direct achieves the write latency of 0.344 ms — an impressive 1,184% lower than StarWind VSAN’s 4.415 ms.
Even when using the mirror and parity tiers, S2D maintains a strong latency advantage at 0.562 ms, outpacing StarWind by 686%. This superior performance stems from S2D’s efficient data handling, consistently delivering lower latency across varying configurations.
Conclusion
To sum it up, both Storage Spaces Direct and StarWind VSAN come with their own set of perks and trade-offs for your IT infrastructure.
Storage Spaces Direct shines in read performance, particularly when virtual machines are aligned with CSV owner nodes. However, we observed some unexpected performance issues during 4K and 64K random-write tests, where S2D sometimes underperformed with data in the mirror tier compared to when it spanned both mirror and parity tiers. This highlights the need for careful monitoring to ensure optimal performance. Mismanagement of workloads can lead to significant performance drops, particularly at certain queue depths. Additionally, S2D requires extra space for fault tolerance, which can impact overall capacity efficiency.
On the other hand, StarWind VSAN proves to be a solid choice for high-performance environments, especially with mixed read/write or write-heavy workloads. It consistently delivers superior write performance under load, regardless of VM placement, and offers better capacity efficiency. However, StarWind VSAN lacks the local read boost that S2D provides, can be more demanding on CPU resources, and showed some anomalies in single-threaded tests.
So, if you’re looking for exceptional read performance and don’t mind keeping a close eye on your workloads, S2D is a great option. But if you want a consistent write and mixed IO performance with better capacity efficiency, StarWind VSAN in NVMe-oF HA configuration is the way to go.
Stay tuned for our upcoming articles, where we’ll dive deeper into these solutions to give you a bigger picture of how they can fit into your IT strategy.