Search
StarWind is a hyperconverged (HCI) vendor with focus on Enterprise ROBO, SMB & Edge

StarWind Virtual SAN (VSAN) vs Microsoft Storage Spaces Direct (S2D), Part 1: Hyper-V HCI Performance Benchmarking (TCP)

  • August 22, 2024
  • 62 min read
StarWind DevOps Team Lead. Volodymyr possesses broad expertise in virtualization, storage, and networking, with exceptional experience in architecture planning, storage protocols, hardware sourcing, and research.
StarWind DevOps Team Lead. Volodymyr possesses broad expertise in virtualization, storage, and networking, with exceptional experience in architecture planning, storage protocols, hardware sourcing, and research.

Introduction

I/O is what either makes or breaks the system. Choosing the right data storage solution is crucial for your system’s performance and reliability. If you’re a Hyper-V fanboy and haven’t spent the last ten years living under a rock, you’re likely aware of both StarWind Virtual SAN (VSAN) and Microsoft Storage Spaces Direct (S2D). So, which one should you go for? In this article, we dive into these two favorable options, comparing their performance, storage efficiency, and how they work in a 2-node Hyper-V cluster setup. By the end, you’ll have a better idea of which solution could be the best fit for you.

To compare these two fairly, we set up a 2-node Hyperconverged Infrastructure (HCI) Hyper-V cluster under two different configurations:

  • StarWind VSAN, NVMe-oF, TCP
    • Full host mirroring, basically a ‘network RAID1’, combined with a RAID5 for local NVMe pool protection, so RAID51 for the whole config eventually.
  • Microsoft S2D (‘Mirror-accelerated parity’), TCP
    • We tested two specific corner cases: a) the entire test workload is placed in the mirror tier to check the ‘best case’ scenario, and b) the workload is split between the mirror and parity tiers to test ‘real-world’ scenario. We specifically avoided c) which is ‘parity only’ and ‘worst case’ for obvious reasons, it’s just unusable…

Disclaimer: For simplicity, and because it isn’t a production environment, we used replicated disk device as a witness. While this setup works fine in a lab, it should be avoided in production, and a physical, out-of-cluster witness is recommended instead.

High-scope interconnect diagram: StarWind VSAN, NVMe-oF, TCP

A diagram of a computer server Description automatically generated

In this setup, each Hyper-V node is equipped with 5 NVMe drives, passed through to the StarWind Linux-based Controller Virtual Machine (CVM) using the Hyper-V ‘PCI pass-thru’ mechanism. Inside the CVM, these NVMe drives are assembled into a single RAID5 virtual LUN. On top of this LUN, two StarWind High Availability (HA) devices are created, ensuring data replication and continuous availability. StarWind NVMe-oF Initiator was chosen for uplink connectivity due to the lack of a native Microsoft NVMe-oF initiator (Microsoft was expected to introduce full NVMe-oF support in Windows Server 2025, but this hasn’t happened yet…), and it ‘brings’ these StarWind HA devices to the Hyper-V nodes. Two Cluster Shared Volumes (CSVs) are then created on these connected devices, one per Hyper-V node, according to Microsoft’s best practices.

High-scope interconnect diagram: Microsoft S2D (‘Mirror-accelerated parity’), TCP

A diagram of a computer server Description automatically generated

This scenario covers two use cases, both within the ‘mirror-accelerated parity’ S2D configuration:

  1. Workload placed in the mirror tier to maximize performance by keeping data in the faster mirror tier. This is the ‘best case’, achievable only with a very lightly used (or underutilized) S2D cluster. When testing your own S2D clusters with ‘mirror-accelerated parity,’ make sure this isn’t the only case you test, or you’ll end up in trouble!
  2. Workload placed across both tiers, simulating a more balanced use case where data moves between the mirror and parity tiers. This reflects more ‘real-world’ conditions, where the workload doesn’t fit entirely in the mirror tier, and the Resilient File System (ReFS) starts offloading data to the parity tier. For the sake of experiment, we also attempted to force a test where writes go directly to the parity tier, the ‘worst case’.

Two storage tiers are set up with different resiliency settings: ‘Mirror’ for ‘performance’ and ‘Parity’ for ‘capacity,’ with the following parameters:

Volumes are allocated with 20% in the mirror tier and 80% in the parity tier, adhering to Microsoft’s recommendations:

ReFS manages the data movement between these tiers to optimize performance. The threshold value, at which ReFS starts moving data between the tiers, was left at the default – 85%.

Capacity efficiency

Storage efficiency is a big deal in storage solutions!

  • StarWind VSAN
    In this configuration (5x NVMe drives per Hyper-V cluster node), it achieves 40% storage efficiency since VSAN combines host mirroring with RAID5 for local NVMe pool protection.
  • Microsoft S2D (‘Mirror-accelerated parity’)
    In this setup, it delivers 35.7% storage efficiency (20% mirror, 80% parity), though this can vary based on the volume percentage allocated to the mirror tier. For more on calculating storage efficiency for mirror-accelerated parity, check out the provided link.

Microsoft recommends leaving some space unallocated in the storage pool to allow volumes to perform ‘in-place’ repairs after an individual drive failure. If enough free space is available, an immediate, in-place, parallel repair can restore volumes to full resiliency even before the failed drive is replaced. This process happens automatically. In our setup, the recommended reserved space is 5.82 TB (20% of the total pool size). See:

A green and white line Description automatically generated

When planning your own S2D deployment, keep these factors in mind to make sure you get the best performance and storage efficiency for your needs!

Testbed Overview:

When evaluating StarWind VSAN and Microsoft S2D storage solutions, we didn’t cut any corners. Our testbed was solid and carefully configured to mirror real-world environments, ensuring our results are relevant, reliable, and reproducible. Here’s a snapshot of the hardware and software components that powered our tests:

Hardware:

Server model Supermicro SYS-220U-TNR
CPU Intel(R) Xeon(R) Platinum 8352Y @2.2GHz
Sockets 2
Cores/Threads 64/128
RAM 256GB
NICs 2x Mellanox ConnectX®-6 EN 200GbE (MCX613106A-VDA)
Storage 5x NVMe Micron 7450 MAX: U.3 3.2TB

Software:

Windows Server Windows Server 2022 Datacenter 21H2 OS build 20348.2527
StarWind VSAN Version V8 (build 15469, CVM 20240530) (kernel – 5.15.0-113-generic)
StarWind NVMe-oF Initiator 2.0.0.672(rev 674).Setup.486

StarWind CVM parameters:

CPU 24 vCPU
RAM 32GB
NICs 1x network adapter for management
4x network adapter for client IO and synchronization
Storage RAID5 (5x NVMe Micron 7450 MAX: U.3 3.2TB)


Testing methodology:

Benchmarks were run using the FIO utility in client/server mode. We set up a total of 20 virtual machines (VMs), with 10 VMs on each server node. Each VM had 4 vCPUs, 8GB of RAM, and three RAW virtual disks connected to separate virtual SCSI controllers.

Test Scenarios:

  • Microsoft Storage Spaces Direct (‘Mirror-accelerated parity’)
    • Mirror-only. For tests where the workload is placed entirely in the mirror tier, each virtual disk size was limited to 10GB in size.
    • Mirror+Parity. For tests utilizing both the mirror and parity tiers, each virtual disk size was set to 100GB.
  • StarWind Virtual SAN
    • For all tests, each virtual disk was 100GB in size.

Data Patterns Tested:

  • 4k random read
  • 4k random read/write (70/30)
  • 4k random write
  • 64k random read
  • 64k random write
  • 1M read
  • 1M write

Pre-Test Warm-Up:

Before running tests, we filled the virtual disks with random data and ‘warmed them up’ with specific patterns to ensure stable flash performance.

  • 4k random read/write (70/30) and 4k random write: VM disks were warmed up with a 4k random write pattern for 4 hours.
  • 64k random write: VM disks were warmed up with a 64k random write pattern for 2 hours.

Test Execution:

  • Duration: Read tests were conducted for 600 seconds, and write tests lasted 1800 seconds.
  • Repetition: All tests were repeated three times, and the average value was used as the final result.

Specific Configurations:

  • Microsoft Storage Spaces Direct (S2D)
    Following Microsoft’s recommendations for the S2D+ReFS scenario, test VMs were placed on the CSV owner node to avoid ReFS redirecting requests to another node, ensuring local data reads without using the network stack and providing less network utilization on writes. Each VHDX file was placed in different subdirectories to optimize ReFS metadata operations and reduce latency.
  • StarWind Virtual SAN (VSAN)
    VMs were evenly distributed across hosts without being pinned to the node that owns the volume. Each VHDX file was placed in different subdirectories to maintain consistent performance.

Benchmarking local NVMe performance:

Before diving into the full evaluation, we checked if the NVMe drives lived up to the vendor’s promises by running a series of tests to see if their performance matched up. Here’s an image showing the vendor-claimed performance:

A table with numbers and text Description automatically generated

Using the FIO utility in client/server mode, we tested the performance of the NVMe SSDs in our server within a local storage setup. We applied different patterns to see how the NVMe SSDs handled various types of data. The results are shown below:

1x NVMe Micron 7450 MAX: U.3 3.2TB
Pattern Numjobs IOdepth IOPs MiB\s Latency (ms)
4k random read 6 32 997,000 3,894 0.192
4k random read/write 70/30 6 16 531,000 2,073 0.142
4k random write 4 4 385,000 1,505 0.041
64k random read 8 8 92,900 5,807 0.688
64k random write 2 1 27,600 1,724 0.072
1M read 1 8 6,663 6,663 1.200
1M write 1 2 5,134 5,134 0.389

Our tests showed that the NVMe drives lived up to what the vendor promised. Whether handling small 4k reads or large 1M writes, they delivered on speed and consistency.

Benchmark results in a table:

The benchmarking results are presented in tables to illustrate performance metrics such as IOPS, throughput (MiB/s), latency (ms), and CPU usage. An additional metric, “IOPS per 1% CPU usage,” highlights the performance dependency on the CPU usage for 4k random read/write patterns. This parameter is calculated using the following formula:

IOPS per 1% CPU usage = IOPS / Node count / Node CPU usage

Where:

  • IOPS represents the number of I/O operations per second for each pattern.
  • Node count is 2 nodes in our case.
  • Node CPU usage denotes the CPU usage of one node during the test.

By incorporating this additional metric, we aimed to provide deeper insights into how CPU usage correlates with IOPS, offering a more nuanced understanding of performance characteristics.

Now let’s delve into the detailed benchmark results for each storage configuration.

StarWind Virtual SAN (VSAN)

The table illustrates StarWind VSAN’s performance under various workload patterns and configurations. For 4k random reads, IOPS scored from 420,000 at lower queue depths to 881,000 at higher depths. In a mixed 4k random read/write (70/30) test, it achieves up to 561,000 IOPS, showcasing its prowess in handling mixed workloads.

In the 64k and 1M read/write patterns, the StarWind VSAN reaches up to 15,2 GB/s, demonstrating its ability to handle these workloads effectively.

VM count Pattern Numjobs IOdepth IOPs MiB/s Latency (ms) Node CPU usage % IOPs per 1% CPU usage
20 4k random read 3 4 420,000 1,641 0.570 45.00% 4,667
4k random read 3 8 307,000 1,200 1.515 35.00% 4,386
4k random read 3 16 546,000 2,134 1.736 50.00% 5,460
4k random read 3 32 741,000 2,895 2.586 57.00% 6,500
4k random read 3 64 836,000 3,265 4.567 58.00% 7,207
4k random read 3 128 881,000 3,442 8.827 60.00% 7,342
4k random read/write (70%/30%) 3 2 241,500 943 0.582 39.00% 3,096
4k random read/write (70%/30%) 3 4 334,000 1,305 0.843 45.00% 3,711
4k random read/write (70%/30%) 3 8 301,200 1,177 1.683 42.00% 3,586
4k random read/write (70%/30%) 3 16 416,000 1,625 2.507 48.00% 4,333
4k random read/write (70%/30%) 3 32 534,000 2,086 4.002 53.00% 5,038
4k random read/write (70%/30%) 3 64 561,000 2,191 7.768 52.00% 5,394
4k random write 3 2 139,000 541 0.859 33.00% 2,106
4k random write 3 4 192,000 751 1.246 39.00% 2,462
4k random write 3 8 238,000 928 2.018 44.00% 2,705
4k random write 3 16 260,000 1,015 3.689 44.00% 2,955
4k random write 3 32 167,000 653 11.476 27.00% 3,093
64k random read 3 2 160,000 10,000 0.749 35.00%
64k random read 3 4 200,000 12,500 1.205 39.00%
64k random read 3 8 210,000 13,125 2.299 40.00%
64k random read 3 16 228,000 14,250 4.203 41.00%
64k random read 3 32 233,000 14,562 8.343 41.00%
64k random write 3 1 44,000 2,751 1.350 25.00%
64k random write 3 2 51,900 3,242 2.311 27.00%
64k random write 3 4 58,300 3,645 4.108 28.00%
64k random write 3 8 62,400 3,900 7.689 29.00%
64k random write 3 16 63,600 3,975 12.070 29.00%
64k random write 3 32 63,800 3,987 30.150 29.00%
1024k read 1 1 10,000 10,000 1.998 26.00%
1024k read 1 2 12,400 12,400 3.225 29.00%
1024k read 1 4 14,100 14,100 5.668 31.00%
1024k read 1 8 15,200 15,200 10.574 32.00%
1024k read 1 16 15,600 15,600 20.625 33.00%
1024k write 1 1 3,443 3,443 5.804 24.00%
1024k write 1 2 3,903 3,903 10.241 25.00%
1024k write 1 4 4,086 4,086 19.561 25.00%
1024k write 1 8 4,156 4,156 38.492 25.00%

Overall, StarWind VSAN shows great performance at 4k random read/write patterns, consistent read and write performance regardless of VM location, and impressive storage efficiency at 40%.

Microsoft Storage Spaces Direct (‘Mirror-accelerated parity’): Mirror-only

The table below shows S2D’s performance with a ‘mirror-accelerated parity’ configuration, focusing on workloads that 100% fit into the mirror tier.

For 4k random read scenarios, IOPS peak at 2,653,000, highlighting exceptional read performance thanks to 100% local-bound reads. In the 4k random read/write (70/30) pattern, results reach up to 654,000 IOPS.

The 64k random read/write and 1M read/write tests maintain high throughput, with 53,5 GB/s for 64k reads and 52,4GB/s for 1M reads. S2D shows exceptional read performance when VMs are on the volume-owning node, and robust write performance within the mirror tier. However, read performance declines if local reading requirements aren’t met and unusual performance drops occur at certain queue depths. Additionally, write performance can drop if VMs are running on a node that is not the volume owner.

VM count Pattern Numjobs IOdepth IOPs MiB/s Latency (ms) Node CPU usage % IOPs per 1% CPU usage
20 4k random read 3 4 833,000 3,256 0.286 27.00% 15,426
4k random read 3 8 752,000 2,937 0.648 21.00% 17,905
4k random read 3 16 1,083,000 4,230 0.884 29.00% 18,672
4k random read 3 32 1,646,000 6,429 1.165 41.00% 20,073
4k random read 3 64 2,344,000 9,158 1.637 54.00% 21,704
4k random read 3 128 2,653,000 10,363 2.897 67.00% 19,799
4k random read/write (70%/30%) 3 2 324,300 1,266 0.382 20.00% 8,108
4k random read/write (70%/30%) 3 4 114,600 447 2.103 7.00% 8,186
4k random read/write (70%/30%) 3 8 62,800 245 7.659 4.00% 7,850
4k random read/write (70%/30%) 3 16 509,000 1,988 1.939 25.00% 10,180
4k random read/write (70%/30%) 3 32 614,000 2,398 3.564 31.00% 9,903
4k random read/write (70%/30%) 3 64 654,000 2,554 6.899 34.00% 9,618
4k random write 3 2 80,300 314 1.499 9.00% 4,461
4k random write 3 4 46,800 183 5.116 6.00% 3,900
4k random write 3 8 34,800 136 13.788 4.00% 4,350
4k random write 3 16 64,700 253 14.876 7.00% 4,621
4k random write 3 32 186,000 728 10.174 18.00% 5,167
64k random read 3 2 317,000 19,812 0.376 17.00%
64k random read 3 4 498,000 31,125 0.478 25.00%
64k random read 3 8 424,000 26,500 1.142 22.00%
64k random read 3 16 623,000 38,937 1.539 27.00%
64k random read 3 32 856,000 53,500 2.243 38.00%
64k random write 3 1 85,700 5,355 0.693 14.00%
64k random write 3 2 58,300 3,645 2.055 10.00%
64k random write 3 4 32,300 2,019 7.435 5.00%
64k random write 3 8 23,300 1,457 20.592 4.00%
64k random write 3 16 41,800 2,616 22.939 6.00%
64k random write 3 32 86,300 5,393 22.138 14.00%
1024k read 1 1 19,900 19,900 1.002 5.00%
1024k read 1 2 31,600 31,600 1.267 7.00%
1024k read 1 4 43,700 43,700 1.825 11.00%
1024k read 1 8 50,300 50,300 3.180 14.00%
1024k read 1 16 52,400 52,400 6.098 16.00%
1024k write 1 1 8,290 8,290 2.400 8.00%
1024k write 1 2 8,693 8,693 4.614 9.00%
1024k write 1 4 8,607 8,607 9.290 9.00%
1024k write 1 8 8,559 8,559 18.684 9.00%

Microsoft Storage Spaces Direct (‘Mirror-accelerated parity’): Mirror+Parity

The performance metrics for the dual-tier configuration in S2D highlight workload placed across both mirror and parity tiers.

In 4k random read patterns, IOPS reach up to 2,500,000, showcasing excellent scalability. The 4k random read/write (70/30) pattern results show up to 247,000 IOPS.

For 64k random read/write and 1M read/write tests, the system maintains strong throughput, with 52,000 MiB/s for 64k reads and 50,100 MiB/s for 1M reads, demonstrating S2D’s robust capability to handle complex data operations across tiers. However, its write performance drops when workloads exceed the mirror tier, which is expected.

VM count Pattern Numjobs IOdepth IOPs MiB/s Latency (ms) Node CPU usage % IOPs per 1% CPU usage
20 4k random read 3 4 814,000 3,179 0.293 27.00% 15,074
4k random read 3 8 739,000 2,886 0.642 26.00% 14,212
4k random read 3 16 1,003,000 3,918 0.956 29.00% 17,293
4k random read 3 32 1,556,000 6,078 1.232 42.00% 18,524
4k random read 3 64 2,190,000 8,554 1.749 55.00% 19,909
4k random read 3 128 2,500,000 9,766 3.068 68.00% 18,382
4k random read/write (70%/30%) 3 2 126,200 492 1.245 27.00% 2,337
4k random read/write (70%/30%) 3 4 108,200 422 2.442 23.00% 2,352
4k random read/write (70%/30%) 3 8 49,800 195 9.766 10.00% 2,490
4k random read/write (70%/30%) 3 16 225,800 882 5.690 37.00% 3,051
4k random read/write (70%/30%) 3 32 247,000 965 11.251 38.00% 3,250
4k random read/write (70%/30%) 3 64 231,400 903 25.634 33.00% 3,506
4k random write 3 2 51,400 201 2.324 24.00% 1,071
4k random write 3 4 58,600 229 4.094 26.00% 1,127
4k random write 3 8 58,600 229 8.170 26.00% 1,127
4k random write 3 16 74,800 292 12.868 29.00% 1,290
4k random write 3 32 74,900 293 25.659 29.00% 1,291
64k random read 3 2 316,000 19,750 0.378 18.00%
64k random read 3 4 488,000 30,500 0.490 26.00%
64k random read 3 8 377,000 23,560 1.296 22.00%
64k random read 3 16 601,000 37,562 1.596 27.00%
64k random read 3 32 832,000 52,000 2.307 38.00%
64k random write 3 1 14,700 919 4.078 13.00%
64k random write 3 2 15,200 950 7.883 17.00%
64k random write 3 4 14,400 900 16.656 17.00%
64k random write 3 8 14,600 913 32.938 17.00%
64k random write 3 16 14,700 919 65.238 18.00%
64k random write 3 32 14,400 900 132.694 18.00%
1024k read 1 1 19,900 19,900 1.002 5.00%
1024k read 1 2 31,600 31,600 1.230 8.00%
1024k read 1 4 42,400 42,400 1.882 11.00%
1024k read 1 8 47,600 47,600 3.363 13.00%
1024k read 1 16 50,100 50,100 6.379 16.00%
1024k write 1 1 1,482 1,482 13.496 4.00%
1024k write 1 2 1,573 1,573 25.448 4.00%
1024k write 1 4 2,295 2,295 34.817 5.00%
1024k write 1 8 2,187 2,187 73.178 5.00%

Overall, S2D shows exceptional performance in both test cases.

Storage efficiency is about 35.7% and could be even less if additional space is assigned for in-place repairs.

Benchmarking results in graphs:

With all benchmarks completed and data collected, we can now compare the results using graphical charts for a clearer understanding.

4k random read:

Figure 1: 4K RR (IOPS)

Figure 1: 4K RR (IOPS)

Let’s start with the 4K random read test, where Figure 1 showcases the performance in IOPS. The S2D in the ‘Mirror-accelerated parity’ configuration with 100% of the workload in the mirror tier reaches a remarkable 833,000 IOPS at 4 I/O queue depth, scaling up to 2,653,000 IOPS at 128 I/O queue depth.

Comparatively, the StarWind VSAN peaks at 881,000 IOPS at 128 I/O queue depth. Here, S2D outshines StarWind VSAN with approximately 200% more IOPS at higher depths.

So, what’s the magic behind S2D’s performance? It’s all about local reading. In a cluster shared volume (CSV) setup, S2D leverages the SMB 3.0 protocol to allow multiple hosts to access and perform I/O operations on a shared volume (if you want to explore this topic in more detail, please read here or check this article). If a VM is running on the node that owns the volume, it can read data directly from the local disk, bypassing the network stack. This local read path minimizes latency and maximizes performance, leading to impressive IOPS numbers.

However, there’s a catch. This local reading perk only works if the VM is on the volume-owning node. If not, the read operations have to go through the network to the owning node, which can and will slow things down! To keep things running smoothly, you need to keep an eye on where your VMs are running and move them to the appropriate nodes as necessary. It’s tricky!

 

Figure 2: 4K RR (Latency)

Figure 2: 4K RR (Latency)

When it comes to random read latency, Figure 2 reveals that Storage Spaces Direct with 100% of the workload within the mirror tier also excels, starting at a low 0.286 ms at 4 I/O queue depth and increasing to 2.897 ms at 128 I/O queue depth. StarWind VSAN starts at 0.570 ms, reaching 8.827 ms at the same depth.

Even with the workload split between the mirror and parity tiers, S2D maintains superior read latency, starting at 0.293 ms and peaking at 3.068 ms. The latency advantage of S2D is again attributed to local reads.

 

Figure 3: 4K RR (IOPS per 1% CPU Usage)

Figure 3: 4K RR (IOPS per 1% CPU Usage)

Switching gears to efficiency, Figure 3 compares IOPS per 1% CPU usage during a 4k random read test. Storage Spaces Direct with 100% of the workload in mirror tier proves highly efficient, delivering up to 21,704 IOPS per 1% CPU usage at 64 I/O queue depth, whereas StarWind VSAN peaks at 7,342 IOPS per 1% CPU usage at 128 I/O queue depth. This makes S2D approximately 196% more efficient.

Even when the workload spans both S2D tiers, mirror and parity, it maintains a strong efficiency advantage, reaching 19,909 IOPS per 1% CPU usage at 64 I/O queue depth.

4k random read/write 70/30:

Figure 4: 4K RR/RW 70%/30% (IOPS)

Figure 4: 4K RR/RW 70%/30% (IOPS)

Next, let’s dive into mixed 70/30 read-write patterns. Figure 4 is key for understanding real-world performance because pure read or write workloads are rare in actual production.

Figure 4 shows the number of IOPS during the mixed 70%/30% 4k random read/write tests with Numjobs = 3.

Interestingly, with Storage Spaces Direct, there’s a noticeable drop in performance at queue depths 4 and 8. This performance drop is not observed in StarWind VSAN tests. StarWind maintains consistent performance, hitting 334,000 IOPS at queue depth 4 and 301,200 IOPS at queue depth 8.

In contrast, with 100% of the workload in the mirror tier, S2D’s performance drops to 114,600 IOPS at queue depth 4 and 62,800 IOPS at queue depth 8, representing reductions of approximately 65.7% and 79.1%, respectively, compared to StarWind VSAN. Fortunately, S2D shows a significant rebound in performance starting at QD=16, ultimately scoring about 25% higher than StarWind VSAN under the same conditions.

However, when the workload is distributed across both mirror and parity tiers, S2D struggles due to ReFS continuously moving new data from the mirror tier to the parity tier, which negatively affects performance. As a result, S2D records 126,200 IOPS at queue depth 2, drops to a low of 49,800 IOPS at QD=8, and then peaks at 247,000 IOPS at queue depth 32.

Meanwhile, StarWind VSAN outperforms S2D, achieving 241,500 IOPS at queue depth 2 and 534,000 IOPS at queue depth 32, highlighting its superior performance with IOPS figures that are 91.4% higher at queue depth 2 and 116.2% higher at queue depth 32 compared to S2D in the “Mirror+Parity” scenario.

 

Figure 5: 4K RR/RW 70%/30% (Latency)

Figure 5: 4K RR/RW 70%/30% (Latency)

Figure 5 examines latency for the mixed 4K random 70/30 workload. Storage Spaces Direct with 100% of the workload in the mirror tier starts at 0.382 ms at 2 I/O queue depth, reaching 6.899 ms at 64 I/O queue depth. StarWind VSAN, on the other hand, starts at 0.582 ms and goes up to 7.768 ms.

 

Figure 6: 4K RR/RW 70%/30% (IOPS per 1% CPU Usage)

Figure 6: 4K RR/RW 70%/30% (IOPS per 1% CPU Usage)

Figure 6 explores the number of IOPS relative to 1% CPU utilization during the same mixed workload.

Storage Spaces Direct with workload within the mirror tier provides up to 10,180 IOPS per 1% CPU usage at 16 I/O queue depth, while StarWind VSAN peaks at 5,394 IOPS per 1% CPU usage. This makes S2D about 89% more efficient.

When workload is touching both tiers, mirror and parity, S2D achieves a maximum of 3,506 IOPS per 1% CPU usage, demonstrating 69.7% less performance compared to the StarWind VSAN at 64 I/Os queue depth.

4k random write:

Figure 7: 4K RW (IOPS)

Figure 7: 4K RW (IOPS)

The ability to maintain consistent write performance across various queue depths is crucial for demanding virtualization environments. Figure 7 shows the amount of IOPS during 4k random write operations.

StarWind VSAN stands out with consistent performance across most queue depths, significantly outperforming S2D (100% of the workload in the mirror tier) from I/O queue depth 2 to 16.

With S2D and 100% of the workload in the mirror tier, there’s a big drop in performance at queue depths 4 and 8. At queue depth 4, Storage Spaces Direct score about 46,800 IOPS which is more than 4 times lower than StarWind VSAN’s 192,000 IOPS figure. At queue depth 8, the gap widens even more and StarWind VSAN ends up being 584% more effective. This result is unexpected, as we anticipated better performance from S2D in this test compared to when the workload spans both mirror and parity tiers.

Interestingly, at queue depth 32 StarWind VSAN loses the advantage scoring 167,000 IOPS, while S2D with 100% of the data in the mirror tier gets traction achieving 186,000 IOPS. That’s being said, when workload is hitting both tiers, S2D is unable to show better performance figures and ends up scoring 74,900 IOPS.

 

Figure 8: 4K RW (Latency)

Figure 8: 4K RW (Latency)

Latency for 4K random writes is also a critical factor. Since latency corresponds to prior IOPS results, the overall picture remains consistent in Figure 8.

StarWind VSAN demonstrates the lowest latency, starting at 0.859 ms with a 2 I/O queue depth and increasing to 3.689 ms at a 16 I/O queue depth. Virtual SAN significantly outperforms Storage Spaces Direct with 100% of the workload in the mirror tier, which starts at 1.499 ms (43% higher latency) and rises to 14.876 ms (75% higher latency) in QD=16 test.

When comparing StarWind to S2D with workload within both tiers, mirror and parity, the performance gap is even more noticable, with StarWind showing 63% lower latency at 2 I/O queue depth (0.859 ms vs. 2.324 ms) and 55% lower latency at 32 I/O queue depth (11.476 ms vs. 25.659 ms). Only at 32 I/O queue depth, StarWind VSAN demonstrates slightly higher latency, reaching 11.476 ms compared to S2D’s 10.174 ms, which is about 11% lower.

 

Figure 9: 4K RW (IOPS per 1% CPU Usage)

Figure 9: 4K RW (IOPS per 1% CPU Usage)

Efficiency in 4K random write workloads is measured in IOPS per 1% CPU usage, as shown in Figure 9.

Storage Spaces Direct with 100% of the workload in the mirror tier achieves up to 5,167 IOPS per 1% CPU usage, while StarWind VSAN peaks at 3,093 IOPS per 1% CPU usage, making S2D approximately 67% more efficient. However, when the workload utilizes both S2D tiers, mirror and parity, efficiency drops significantly, with a maximum of only 1,291 IOPS per 1% CPU usage, making it the least efficient of the three scenarios.

64k random read:

Figure 10: 64K RR (Throughput)

Figure 10: 64K RR (Throughput)

Moving to larger data blocks, Figure 10 illustrates the throughput performance for 64K random reads.

Storage Spaces Direct with 100% of the workload in the mirror tier significantly outpaces StarWind VSAN, achieving a peak of 53,500 MiB/s at 32 I/O queue depth compared to StarWind’s 14,562 MiB/s. This indicates that S2D delivers approximately 267% more throughput.

When the workload utilizes both S2D tiers, mirror and parity, it shows slightly lower throughput but still surpasses StarWind VSAN significantly. The higher performance in S2D is attributed to local reads, but remember, this efficiency is conditional on the VM running on the node that owns the volume. StarWind VSAN, in contrast, provides stable performance regardless of VM placement, eliminating the need for additional monitoring and VM-to-host binding.

 

Figure 11: 64K RR (Latency)

Figure 11: 64K RR (Latency)

Figure 11 shows the latency for 64K random reads. The results align with the throughput data discussed earlier.

Here, S2D with 100% of the workload in the mirror tier maintains low latency due to local reads, starting at 0.376 ms and reaching 2.243 ms at 32 I/O queue depth. The StarWind VSAN starts higher at 0.749 ms and peaks at 8.343 ms, which is up to 73% higher latency than S2D.

 

Figure 12: 64K RR (CPU Usage)

Figure 12: 64K RR (CPU Usage)

In Figure 12, we examine CPU usage during 64K random reads.

Storage Spaces Direct with 100% of the workload in the mirror tier starts at 17% CPU usage at 2 I/O queue depth and peaks at 38% at 32 I/O queue depth. When the workload hits both tiers, mirror and parity, S2D shows consistent CPU usage trends, closely following the S2D “mirror-only” test results.

StarWind VSAN begins significantly higher at 35% and peaks at 41%, slightly above S2D. This indicates that S2D is more efficient in CPU usage, with StarWind VSAN using approximately 106% more CPU at I/O queue depths 2 to 16 and about 8% more at I/O queue depth of 32.

64k random write:

Figure 13: 64K RW (Throughput)

Figure 13: 64K RW (Throughput)

Figure 13 illustrates the 64K random write throughput, highlighting performance differences across three scenarios.

Storage Spaces Direct with 100% of the workload in the mirror tier exhibits erratic performance, with notable drops at medium I/O queue depths. For example, throughput falls to 2,019 MiB/s at a 4 I/O queue depth, dips further to 1,457 MiB/s at 8 I/O queue depth, and then rebounds to 2,616 MiB/s at 16 I/O queue depth. This pattern reflects the behavior observed in 4K random write tests. Not stable. No good!

In contrast, StarWind VSAN delivers more consistent performance, surpassing S2D by 79.6% at a 4 I/O queue depth, by 167.6% at 8 I/O queue depth, and by 52% at 16 I/O queue depth.

When workloads span both tiers, mirror and parity, S2D shows significantly lower throughput across the board. StarWind VSAN outperforms S2D by 199% at a 1 I/O queue depth, with the performance gap widening to 343% at a 32 I/O queue depth.

This highlights StarWind’s capability to handle write operations with consistently high performance across varying I/O queue depths.

 

Figure 14: 64K RW (Latency)

Figure 14: 64K RW (Latency)

Figure 14 displays the latency for 64K random writes, showing a similar trend.

StarWind VSAN delivers faster response times at lower I/O queue depths (4, 8, and 16), but S2D (with 100% of the workload in the mirror tier) takes the lead at a 32 I/O queue depth, achieving a lower latency of 22.138 ms compared to StarWind’s 30.150 ms.

When compared to S2D’s configuration with workloads spread across both tiers, mirror and parity, StarWind VSAN is significantly more efficient, providing 66.9% faster response times at a 1 I/O queue depth and 77.3% lower latency at a 32 I/O queue depth.

 

Figure 15: 64K RW (CPU usage)

Figure 15: 64K RW (CPU usage)

Figure 15 highlights CPU usage during 64K random writes.

StarWind VSAN consistently shows higher CPU utilization compared to both Storage Spaces Direct configurations (Mirror-only and Mirror+Parity). At a 1 I/O queue depth, StarWind uses 25% CPU, which is 79% higher than S2D’s 14% with 100% of the workload in the mirror tier and 92% higher than S2D’s 13% in the mixed-tier test. This trend persists across different I/O queue depths, with StarWind maintaining higher CPU usage but delivering more consistent performance under varying workloads.

1M read:

Figure 16: 1024K R (Throughput)

Figure 16: 1024K R (Throughput)

Figure 16 presents the throughput results for 1024K reads, where S2D with 100% of the workload in the mirror tier significantly outperforms StarWind VSAN, reaching 52,000 MiB/s at a 16 I/O queue depth compared to StarWind’s 15,600 MiB/s — about 233% higher throughput.

Even when workloads are spread across both tiers, mirror and parity, S2D continues to outperform StarWind VSAN by a substantial margin. This impressive read performance from S2D is again due to local reads when the test VM is located on the CSV owner node.

 

Figure 17: 1024K R (Latency)

Figure 17: 1024K R (Latency)

Figure 17 shows the latency results during the 1024K read test, reflecting a pattern similar to the throughput results.

S2D with 100% of the workload in the mirror tier demonstrates impressively low latency, benefiting from local reads. Latency starts at 1.002 ms and increases to 6.098 ms as I/O queue depth grows.

In contrast, StarWind VSAN starts at 1.998 ms and peaks at 20.625 ms, resulting in S2D delivering up to 240% lower latency than StarWind.

When workloads are distributed across both S2D tiers, mirror and parity, latency remains nearly identical to that of the mirror tier tests. It’s all thanks to the local reads!

 

Figure 18: 1024K R (CPU Usage)

Figure 18: 1024K R (CPU Usage)

Figure 18 highlights CPU usage during 1024K reads, where S2D demonstrates significantly lower resource consumption compared to StarWind VSAN.

With workloads in the mirror tier, S2D starts at 5% CPU usage at a 1 I/O queue depth and increases to 16% at a 16 I/O queue depth.

In contrast, StarWind VSAN begins at 26% and rises to 33%, meaning S2D uses about 63% less CPU on average.

Even when workloads span both tiers, mirror and parity, S2D maintains the same CPU usage levels as in the mirror-only benchmarks. S2D’s efficiency in local reads translates to more effective CPU usage, while StarWind requires more resources to sustain consistent performance.

1M write:

Figure 19: 1024K W (Throughput)

Figure 19: 1024K W (Throughput)

When we shift our focus to 1024K sequential write throughput, Figure 19 reveals that Storage Spaces Direct (S2D) with 100% of the workload in the mirror tier holds a significant performance advantage over StarWind VSAN. Specifically, S2D reaches 8,693 MiB/s at a 2 I/O queue depth, while StarWind VSAN manages 3,903 MiB/s. At an 8 I/O queue depth, S2D continues to dominate, hitting 8,559 MiB/s, compared to StarWind’s 4,156 MiB/s. This means S2D delivers approximately 122% higher throughput, if your workload fits into the mirror tier…

However, it’s important to note that this advantage is conditional. If the workload doesn’t fit into the mirror tier, S2D’s performance drops dramatically. This is evident in the multi-tiered test results, where StarWind VSAN outperforms Storage Spaces Direct by about 106% on average.

 

Figure 20: 1024K W (Latency)

Figure 20: 1024K W (Latency)

Diving into the 1024K write latency as depicted in Figure 20, we see a consistent theme.

With 100% of its workload in the mirror tier, Storage Spaces Direct begins at a brisk 2.400 ms and climbs to 18.684 ms at 8 I/O queue depth. In comparison, StarWind VSAN starts at a slower 5.804 ms and escalates to a higher peak of 38.492 ms, which demonstrates that S2D provides up to 106% lower latency.

However, when the workload spans both the mirror and parity tiers, everything gets flipped upside down. S2D records the highest latencies, starting at 13.496 ms and surging to 73.178 ms at 8 I/O queue depth. This again indicates a significant performance shift depending on how well the workload is aligned with S2D’s optimal tier configuration.

 

Figure 21: 1024K W (CPU Usage)

Figure 21: 1024K W (CPU Usage)

Figure 21 highlights CPU usage during 1024K writes. With 100% of the workload in the mirror tier, Storage Spaces Direct (S2D) begins at 8% CPU usage at a queue depth of 1 I/O and consistently holds at 9% across queue depths of 2, 4, and 8 I/O.

In contrast, StarWind VSAN begins at a much higher 24% and remains steady around 25% across all I/O queue depths. This indicates that S2D consumes approximately 72% less CPU on average, demonstrating significantly more efficient resource utilization compared to StarWind VSAN.

When the workload spans both tiers of S2D, mirror and parity, it continues to exhibit even lower CPU usage, starting at just 4% at 1 and 2 I/O queue depths and modestly rising to 5% at 4 and 8 I/O queue depths.

Additional benchmarking: 1 VM, 1 numjobs, 1 iodepth

To gain a deeper understanding of how StarWind VSAN and Storage Spaces Direct perform under specific synthetic conditions, we conducted additional benchmarks focusing on a single VM scenario, with numjobs = 1 and an I/O queue depth of 1. Typically, this is the best way to measure storage access latency.

Benchmark results in a table:

StarWind VSAN, Host mirroring + RAID5, 1 VM
Pattern Numjobs IOdepth IOPs MiB\s Latency (ms)
4k random read 1 1 1,112 4 0.897
4k random write 1 1 501 2 1.991
4k random write (synchronous) 1 1 226 1 4.415
Storage Spaces Direct (TCP) – Nested mirror accelerated parity – Data in mirror tier (1 VM)
Pattern Numjobs IOdepth IOPs MiB\s Latency (ms)
4k random read 1 1 7,221 28 0.137
4k random write 1 1 5,456 21 0.182
4k random write (synchronous) 1 1 2,887 11 0.344

 

Storage Spaces Direct (TCP) – Nested mirror accelerated parity – Data in mirror and parity tiers (1 VM)
Pattern Numjobs IOdepth IOPs MiB\s Latency (ms)
4k random read 1 1 5,920 23 0.167
4k random write 1 1 2,517 10 0.395
4k random write (synchronous) 1 1 1,772 7 0.562

Benchmark results in graphs:

This section presents visual comparisons of the performance and latency metrics across storage configurations under research.

4k random read:

Figure 1: 4K RR (IOPS)

Figure 1: 4K RR (IOPS)

Figure 1 demonstrates IOPS for the 4K random read test at 1 I/O queue depth and with one numjobs. S2D with 100% of the workload in mirror tier outperforms the StarWind VSAN delivering 7,221 IOPS.

This remarkable 550% increase over StarWind’s 1,112 IOPS is primarily due to S2D’s ability to leverage local reads and operate entirely at the host level. In contrast, StarWind VSAN, running inside a VM and mixing local and network I/O, usually faces a much longer data path, which impacts its performance negatively.

Even when S2D operates with data across both mirror and parity tiers, it maintains strong performance at 5,920 IOPS, still surpassing StarWind by 432%.

 

Figure 2: 4K RR (Latency)

Figure 2: 4K RR (Latency)

Latency metrics for the 4K random read test at 1 I/O queue depth, as shown in Figure 2, similarly favor Storage Spaces Direct with 100% of the workload in the mirror tier, which records a swift 0.137 ms. This is a substantial 553% lower latency compared to StarWind VSAN’s 0.897 ms. This advantage is again due to S2D’s local read capabilities and direct host-level operation. Even in a mixed-tier setup, mirror and parity, S2D maintains its lead with a latency of 0.167 ms, still outperforming StarWind by 437%.

 

Figure 3: 4K RW (IOPS)

Figure 3: 4K RW (IOPS)

Figure 3 showcases the results of the 4K random write test at I/O queue depth=1 with a numjob=1. Storage Spaces Direct with 100% of the workload in the mirror tier achieves a remarkable 5,456 IOPS, an astounding 990% higher than StarWind VSAN’s 501 IOPS. This significant advantage stems from S2D’s ability to write directly to the mirror tier, bypassing the resource-intensive parity calculation and avoiding read-modify-write.

However, when S2D handles workloads across both mirror and parity tiers, performance drops to 2,517 IOPS due to the additional overhead of invalidating data in the parity tier. For a deeper dive into how reading and writing function in a mirror-accelerated parity scenario, please refer to the detailed explanation provided here.

On the other hand, StarWind VSAN, which writes directly to the RAID5 virtual LUN, experiences performance degradation due to the read-modify-write (RMW) operations and the extended I/O data path inherent in its VM-based operation. Despite these technical challenges, the performance of StarWind VSAN at queue depth of 1 appears unusually low, prompting us to initiate an investigation into this issue to uncover the underlying cause.

 

Figure 4: 4K RW (Latency)

Figure 4: 4K RW (Latency)

Moving on to Figure 4, we examine the latency metrics for 4K random writes.

No surprises here. Storage Spaces Direct (S2D) continues to deliver superior performance, benefiting from its efficient data handling within the mirror tier and achieving an impressively low latency of 0.182 ms. This represents a staggering 995% improvement over StarWind VSAN’s 1.991 ms.

Even when S2D operates across both mirror and parity tiers, it maintains a competitive latency of 0.395 ms, still outperforming StarWind by 404%.

 

Figure 5: 4K RW Synchronous (IOPS)

Figure 5: 4K RW Synchronous (IOPS)

In our synchronous 4K read-write single-threaded I/O tests, as shown in Figure 5, Storage Spaces Direct with the dataset entirely in the mirror tier once again takes the lead, achieving 2,887 IOPS — a staggering 1,177% increase over StarWind VSAN’s 226 IOPS.

This significant performance boost is attributed to the same factors observed in asynchronous 4K random write test, where S2D benefits from direct writes to the mirror tier, effectively bypassing the resource-heavy parity calculations and avoiding read-modify-write.

Even in the “mixed tiers” setup, S2D maintains a strong advantage, delivering 1,772 IOPS and still outpacing StarWind by 684%.

 

Figure 6: 4K RW Synchronous (Latency)

Figure 6: 4K RW Synchronous (Latency)

Figure 6 highlights the latency results for synchronous 4K read-write single-threaded I/O, further confirming S2D’s performance edge.

With 100% of the workload within the mirror tear, Storage Spaces Direct achieves the write latency of 0.344 ms — an impressive 1,184% lower than StarWind VSAN’s 4.415 ms.

Even when using the mirror and parity tiers, S2D maintains a strong latency advantage at 0.562 ms, outpacing StarWind by 686%. This superior performance stems from S2D’s efficient data handling, consistently delivering lower latency across varying configurations.

Conclusion

To sum it up, both Storage Spaces Direct and StarWind VSAN come with their own set of perks and trade-offs for your IT infrastructure.

Storage Spaces Direct shines in read performance, particularly when virtual machines are aligned with CSV owner nodes. However, we observed some unexpected performance issues during 4K and 64K random-write tests, where S2D sometimes underperformed with data in the mirror tier compared to when it spanned both mirror and parity tiers. This highlights the need for careful monitoring and VM data placing to ensure optimal performance. Mismanagement of workloads can lead to significant performance drops, particularly at certain queue depths. Additionally, S2D requires extra space for fault tolerance, which can impact overall storage efficiency.

On the other hand, StarWind VSAN proves to be a solid choice for high-performance environments, especially with mixed read/write or write-heavy workloads. It consistently delivers superior write performance under load, regardless of VM placement, and offers better capacity efficiency. However, StarWind VSAN lacks the local read boost that S2D provides, can be more demanding on CPU resources, and showed some anomalies in single-threaded tests.

So, if you’re looking for exceptional read performance and don’t mind keeping a close eye on your workloads, S2D is a great option. But if you’re after consistent write and mixed I/O performance with better capacity efficiency, all in a rugged ‘fire-and-forge’ mode, StarWind VSAN is the way to go.

Stay tuned for our upcoming articles, where we’ll dive deeper into these solutions to give you a bigger picture of how they can fit into your IT strategy.

Hey! Found Volodymyr’s insights useful? Looking for a cost-effective, high-performance, and easy-to-use hyperconverged platform?
Taras Shved
Taras Shved StarWind HCI Appliance Product Manager
Look no further! StarWind HCI Appliance (HCA) is a plug-and-play solution that combines compute, storage, networking, and virtualization software into a single easy-to-use hyperconverged platform. It's designed to significantly trim your IT costs and save valuable time. Interested in learning more? Book your StarWind HCA demo now to see it in action!