StarWind VSAN vs Microsoft S2D: Performance Comparison

Introduction

I/O is what either makes or breaks the system. Choosing the right data storage solution is crucial for your system’s performance and reliability. If you’re a Hyper-V fanboy and haven’t spent the last ten years living under a rock, you’re likely aware of both StarWind Virtual SAN (VSAN) and Microsoft Storage Spaces Direct (S2D). So, which one should you go for? In this article, we dive into these two favorable options, comparing their performance, storage efficiency, and how they work in a 2-node Hyper-V cluster setup. By the end, you’ll have a better idea of which solution could be the best fit for you.

To compare these two fairly, we set up a 2-node Hyperconverged Infrastructure (HCI) Hyper-V cluster under two different configurations:

StarWind VSAN, NVMe-oF, TCP
- Full host mirroring, basically a ‘network RAID1’, combined with a RAID5 for local NVMe pool protection, so RAID51 for the whole config eventually.
Microsoft S2D (‘Mirror-accelerated parity’), TCP
- We tested two specific corner cases: a) the entire test workload is placed in the mirror tier to check the ‘best case’ scenario, and b) the workload is split between the mirror and parity tiers to test ‘real-world’ scenario. We specifically avoided c) which is ‘parity only’ and ‘worst case’ for obvious reasons, it’s just unusable…

Disclaimer: For simplicity, and because it isn’t a production environment, we used replicated disk device as a witness. While this setup works fine in a lab, it should be avoided in production, and a physical, out-of-cluster witness is recommended instead.

High-scope interconnect diagram: StarWind VSAN, NVMe-oF, TCP

In this setup, each Hyper-V node is equipped with 5 NVMe drives, passed through to the StarWind Linux-based Controller Virtual Machine (CVM) using the Hyper-V ‘PCI pass-thru’ mechanism. Inside the CVM, these NVMe drives are assembled into a single RAID5 virtual LUN. On top of this LUN, two StarWind High Availability (HA) devices are created, ensuring data replication and continuous availability. StarWind NVMe-oF Initiator was chosen for uplink connectivity due to the lack of a native Microsoft NVMe-oF initiator (Microsoft was expected to introduce full NVMe-oF support in Windows Server 2025, but this hasn’t happened yet…), and it ‘brings’ these StarWind HA devices to the Hyper-V nodes. Two Cluster Shared Volumes (CSVs) are then created on these connected devices, one per Hyper-V node, according to Microsoft’s best practices.

High-scope interconnect diagram: Microsoft S2D (‘Mirror-accelerated parity’), TCP

This scenario covers two use cases, both within the ‘mirror-accelerated parity’ S2D configuration:

Workload placed in the mirror tier to maximize performance by keeping data in the faster mirror tier. This is the ‘best case’, achievable only with a very lightly used (or underutilized) S2D cluster. When testing your own S2D clusters with ‘mirror-accelerated parity,’ make sure this isn’t the only case you test, or you’ll end up in trouble!
Workload placed across both tiers, simulating a more balanced use case where data moves between the mirror and parity tiers. This reflects more ‘real-world’ conditions, where the workload doesn’t fit entirely in the mirror tier, and the Resilient File System (ReFS) starts offloading data to the parity tier. For the sake of experiment, we also attempted to force a test where writes go directly to the parity tier, the ‘worst case’.

Two storage tiers are set up with different resiliency settings: ‘Mirror’ for ‘performance’ and ‘Parity’ for ‘capacity,’ with the following parameters:

New-StorageTier -StoragePoolFriendlyName s2d-pool -FriendlyName NestedPerformance -ResiliencySettingName Mirror -MediaType SSD -NumberOfDataCopies 4

New-StorageTier -StoragePoolFriendlyName s2d-pool -FriendlyName NestedCapacity -ResiliencySettingName Parity -MediaType SSD -NumberOfDataCopies 2 -PhysicalDiskRedundancy 1 -NumberOfGroups 1 -FaultDomainAwareness StorageScaleUnit -ColumnIsolation PhysicalDisk -NumberOfColumns 4

Volumes are allocated with 20% in the mirror tier and 80% in the parity tier, adhering to Microsoft’s recommendations:

New-Volume -StoragePoolFriendlyName s2d-pool -FriendlyName Volume01 -StorageTierFriendlyNames NestedPerformance, NestedCapacity -StorageTierSizes 820GB, 3276GB

New-Volume -StoragePoolFriendlyName s2d-pool -FriendlyName Volume02 -StorageTierFriendlyNames NestedPerformance, NestedCapacity -StorageTierSizes 820GB, 3276GB

ReFS manages the data movement between these tiers to optimize performance. The threshold value, at which ReFS starts moving data between the tiers, was left at the default – 85%.

Capacity efficiency

Storage efficiency is a big deal in storage solutions!

StarWind VSAN
In this configuration (5x NVMe drives per Hyper-V cluster node), it achieves 40% storage efficiency since VSAN combines host mirroring with RAID5 for local NVMe pool protection.
Microsoft S2D (‘Mirror-accelerated parity’)
In this setup, it delivers 35.7% storage efficiency (20% mirror, 80% parity), though this can vary based on the volume percentage allocated to the mirror tier. For more on calculating storage efficiency for mirror-accelerated parity, check out the provided link.

Microsoft recommends leaving some space unallocated in the storage pool to allow volumes to perform ‘in-place’ repairs after an individual drive failure. If enough free space is available, an immediate, in-place, parallel repair can restore volumes to full resiliency even before the failed drive is replaced. This process happens automatically. In our setup, the recommended reserved space is 5.82 TB (20% of the total pool size). See:

When planning your own S2D deployment, keep these factors in mind to make sure you get the best performance and storage efficiency for your needs!

Testbed Overview:

When evaluating StarWind VSAN and Microsoft S2D storage solutions, we didn’t cut any corners. Our testbed was solid and carefully configured to mirror real-world environments, ensuring our results are relevant, reliable, and reproducible. Here’s a snapshot of the hardware and software components that powered our tests:

Hardware:

Server model	Supermicro SYS-220U-TNR
CPU	Intel(R) Xeon(R) Platinum 8352Y @2.2GHz
Sockets	2
Cores/Threads	64/128
RAM	256GB
NICs	2x Mellanox ConnectX®-6 EN 200GbE (MCX613106A-VDA)
Storage	5x NVMe Micron 7450 MAX: U.3 3.2TB

Software:

Windows Server	Windows Server 2022 Datacenter 21H2 OS build 20348.2527
StarWind VSAN	Version V8 (build 15469, CVM 20240530) (kernel – 5.15.0-113-generic)
StarWind NVMe-oF Initiator	2.0.0.672(rev 674).Setup.486

StarWind CVM parameters:

CPU	24 vCPU
RAM	32GB
NICs	1x network adapter for management 4x network adapter for client IO and synchronization
Storage	RAID5 (5x NVMe Micron 7450 MAX: U.3 3.2TB)

Testing methodology:

Benchmarks were run using the FIO utility in client/server mode. We set up a total of 20 virtual machines (VMs), with 10 VMs on each server node. Each VM had 4 vCPUs, 8GB of RAM, and three RAW virtual disks connected to separate virtual SCSI controllers.

Test Scenarios:

Microsoft Storage Spaces Direct (‘Mirror-accelerated parity’)
- Mirror-only. For tests where the workload is placed entirely in the mirror tier, each virtual disk size was limited to 10GB in size.
- Mirror+Parity. For tests utilizing both the mirror and parity tiers, each virtual disk size was set to 100GB.
StarWind Virtual SAN
- For all tests, each virtual disk was 100GB in size.

Data Patterns Tested:

4k random read
4k random read/write (70/30)
4k random write
64k random read
64k random write
1M read
1M write

Pre-Test Warm-Up:

Before running tests, we filled the virtual disks with random data and ‘warmed them up’ with specific patterns to ensure stable flash performance.

4k random read/write (70/30) and 4k random write: VM disks were warmed up with a 4k random write pattern for 4 hours.
64k random write: VM disks were warmed up with a 64k random write pattern for 2 hours.

Test Execution:

Duration: Read tests were conducted for 600 seconds, and write tests lasted 1800 seconds.
Repetition: All tests were repeated three times, and the average value was used as the final result.

Specific Configurations:

Microsoft Storage Spaces Direct (S2D)
Following Microsoft’s recommendations for the S2D+ReFS scenario, test VMs were placed on the CSV owner node to avoid ReFS redirecting requests to another node, ensuring local data reads without using the network stack and providing less network utilization on writes. Each VHDX file was placed in different subdirectories to optimize ReFS metadata operations and reduce latency.
StarWind Virtual SAN (VSAN)
VMs were evenly distributed across hosts without being pinned to the node that owns the volume. Each VHDX file was placed in different subdirectories to maintain consistent performance.

Benchmarking local NVMe performance:

Before diving into the full evaluation, we checked if the NVMe drives lived up to the vendor’s promises by running a series of tests to see if their performance matched up. Here’s an image showing the vendor-claimed performance:

Using the FIO utility in client/server mode, we tested the performance of the NVMe SSDs in our server within a local storage setup. We applied different patterns to see how the NVMe SSDs handled various types of data. The results are shown below:

	1x NVMe Micron 7450 MAX: U.3 3.2TB
Pattern	Numjobs	IOdepth	IOPs	MiB\s	Latency (ms)
4k random read	6	32	997,000	3,894	0.192
4k random read/write 70/30	6	16	531,000	2,073	0.142
4k random write	4	4	385,000	1,505	0.041
64k random read	8	8	92,900	5,807	0.688
64k random write	2	1	27,600	1,724	0.072
1M read	1	8	6,663	6,663	1.200
1M write	1	2	5,134	5,134	0.389

Our tests showed that the NVMe drives lived up to what the vendor promised. Whether handling small 4k reads or large 1M writes, they delivered on speed and consistency.

Benchmark results in a table:

The benchmarking results are presented in tables to illustrate performance metrics such as IOPS, throughput (MiB/s), latency (ms), and CPU usage. An additional metric, “IOPS per 1% CPU usage,” highlights the performance dependency on the CPU usage for 4k random read/write patterns. This parameter is calculated using the following formula:

IOPS per 1% CPU usage = IOPS / Node count / Node CPU usage

Where:

IOPS represents the number of I/O operations per second for each pattern.
Node count is 2 nodes in our case.
Node CPU usage denotes the CPU usage of one node during the test.

By incorporating this additional metric, we aimed to provide deeper insights into how CPU usage correlates with IOPS, offering a more nuanced understanding of performance characteristics.

Now let’s delve into the detailed benchmark results for each storage configuration.

StarWind Virtual SAN (VSAN)

The table illustrates StarWind VSAN’s performance under various workload patterns and configurations. For 4k random reads, IOPS scored from 420,000 at lower queue depths to 881,000 at higher depths. In a mixed 4k random read/write (70/30) test, it achieves up to 561,000 IOPS, showcasing its prowess in handling mixed workloads.

In the 64k and 1M read/write patterns, the StarWind VSAN reaches up to 15,2 GB/s, demonstrating its ability to handle these workloads effectively.

VM count	Pattern	Numjobs	IOdepth	IOPs	MiB/s	Latency (ms)	Node CPU usage %	IOPs per 1% CPU usage
20	4k random read	3	4	420,000	1,641	0.570	45.00%	4,667
	4k random read	3	8	307,000	1,200	1.515	35.00%	4,386
	4k random read	3	16	546,000	2,134	1.736	50.00%	5,460
	4k random read	3	32	741,000	2,895	2.586	57.00%	6,500
	4k random read	3	64	836,000	3,265	4.567	58.00%	7,207
	4k random read	3	128	881,000	3,442	8.827	60.00%	7,342
	4k random read/write (70%/30%)	3	2	241,500	943	0.582	39.00%	3,096
	4k random read/write (70%/30%)	3	4	334,000	1,305	0.843	45.00%	3,711
	4k random read/write (70%/30%)	3	8	301,200	1,177	1.683	42.00%	3,586
	4k random read/write (70%/30%)	3	16	416,000	1,625	2.507	48.00%	4,333
	4k random read/write (70%/30%)	3	32	534,000	2,086	4.002	53.00%	5,038
	4k random read/write (70%/30%)	3	64	561,000	2,191	7.768	52.00%	5,394
	4k random write	3	2	139,000	541	0.859	33.00%	2,106
	4k random write	3	4	192,000	751	1.246	39.00%	2,462
	4k random write	3	8	238,000	928	2.018	44.00%	2,705
	4k random write	3	16	260,000	1,015	3.689	44.00%	2,955
	4k random write	3	32	167,000	653	11.476	27.00%	3,093
	64k random read	3	2	160,000	10,000	0.749	35.00%
	64k random read	3	4	200,000	12,500	1.205	39.00%
	64k random read	3	8	210,000	13,125	2.299	40.00%
	64k random read	3	16	228,000	14,250	4.203	41.00%
	64k random read	3	32	233,000	14,562	8.343	41.00%
	64k random write	3	1	44,000	2,751	1.350	25.00%
	64k random write	3	2	51,900	3,242	2.311	27.00%
	64k random write	3	4	58,300	3,645	4.108	28.00%
	64k random write	3	8	62,400	3,900	7.689	29.00%
	64k random write	3	16	63,600	3,975	12.070	29.00%
	64k random write	3	32	63,800	3,987	30.150	29.00%
	1024k read	1	1	10,000	10,000	1.998	26.00%
	1024k read	1	2	12,400	12,400	3.225	29.00%
	1024k read	1	4	14,100	14,100	5.668	31.00%
	1024k read	1	8	15,200	15,200	10.574	32.00%
	1024k read	1	16	15,600	15,600	20.625	33.00%
	1024k write	1	1	3,443	3,443	5.804	24.00%
	1024k write	1	2	3,903	3,903	10.241	25.00%
	1024k write	1	4	4,086	4,086	19.561	25.00%
	1024k write	1	8	4,156	4,156	38.492	25.00%

Overall, StarWind VSAN shows great performance at 4k random read/write patterns, consistent read and write performance regardless of VM location, and impressive storage efficiency at 40%.

Microsoft Storage Spaces Direct (‘Mirror-accelerated parity’): Mirror-only

The table below shows S2D’s performance with a ‘mirror-accelerated parity’ configuration, focusing on workloads that 100% fit into the mirror tier.

For 4k random read scenarios, IOPS peak at 2,653,000, highlighting exceptional read performance thanks to 100% local-bound reads. In the 4k random read/write (70/30) pattern, results reach up to 654,000 IOPS.

The 64k random read/write and 1M read/write tests maintain high throughput, with 53,5 GB/s for 64k reads and 52,4GB/s for 1M reads. S2D shows exceptional read performance when VMs are on the volume-owning node, and robust write performance within the mirror tier. However, read performance declines if local reading requirements aren’t met and unusual performance drops occur at certain queue depths. Additionally, write performance can drop if VMs are running on a node that is not the volume owner.

VM count	Pattern	Numjobs	IOdepth	IOPs	MiB/s	Latency (ms)	Node CPU usage %	IOPs per 1% CPU usage
20	4k random read	3	4	833,000	3,256	0.286	27.00%	15,426
	4k random read	3	8	752,000	2,937	0.648	21.00%	17,905
	4k random read	3	16	1,083,000	4,230	0.884	29.00%	18,672
	4k random read	3	32	1,646,000	6,429	1.165	41.00%	20,073
	4k random read	3	64	2,344,000	9,158	1.637	54.00%	21,704
	4k random read	3	128	2,653,000	10,363	2.897	67.00%	19,799
	4k random read/write (70%/30%)	3	2	324,300	1,266	0.382	20.00%	8,108
	4k random read/write (70%/30%)	3	4	114,600	447	2.103	7.00%	8,186
	4k random read/write (70%/30%)	3	8	62,800	245	7.659	4.00%	7,850
	4k random read/write (70%/30%)	3	16	509,000	1,988	1.939	25.00%	10,180
	4k random read/write (70%/30%)	3	32	614,000	2,398	3.564	31.00%	9,903
	4k random read/write (70%/30%)	3	64	654,000	2,554	6.899	34.00%	9,618
	4k random write	3	2	80,300	314	1.499	9.00%	4,461
	4k random write	3	4	46,800	183	5.116	6.00%	3,900
	4k random write	3	8	34,800	136	13.788	4.00%	4,350
	4k random write	3	16	64,700	253	14.876	7.00%	4,621
	4k random write	3	32	186,000	728	10.174	18.00%	5,167
	64k random read	3	2	317,000	19,812	0.376	17.00%
	64k random read	3	4	498,000	31,125	0.478	25.00%
	64k random read	3	8	424,000	26,500	1.142	22.00%
	64k random read	3	16	623,000	38,937	1.539	27.00%
	64k random read	3	32	856,000	53,500	2.243	38.00%
	64k random write	3	1	85,700	5,355	0.693	14.00%
	64k random write	3	2	58,300	3,645	2.055	10.00%
	64k random write	3	4	32,300	2,019	7.435	5.00%
	64k random write	3	8	23,300	1,457	20.592	4.00%
	64k random write	3	16	41,800	2,616	22.939	6.00%
	64k random write	3	32	86,300	5,393	22.138	14.00%
	1024k read	1	1	19,900	19,900	1.002	5.00%
	1024k read	1	2	31,600	31,600	1.267	7.00%
	1024k read	1	4	43,700	43,700	1.825	11.00%
	1024k read	1	8	50,300	50,300	3.180	14.00%
	1024k read	1	16	52,400	52,400	6.098	16.00%
	1024k write	1	1	8,290	8,290	2.400	8.00%
	1024k write	1	2	8,693	8,693	4.614	9.00%
	1024k write	1	4	8,607	8,607	9.290	9.00%
	1024k write	1	8	8,559	8,559	18.684	9.00%

Microsoft Storage Spaces Direct (‘Mirror-accelerated parity’): Mirror+Parity

The performance metrics for the dual-tier configuration in S2D highlight workload placed across both mirror and parity tiers.

In 4k random read patterns, IOPS reach up to 2,500,000, showcasing excellent scalability. The 4k random read/write (70/30) pattern results show up to 247,000 IOPS.

For 64k random read/write and 1M read/write tests, the system maintains strong throughput, with 52,000 MiB/s for 64k reads and 50,100 MiB/s for 1M reads, demonstrating S2D’s robust capability to handle complex data operations across tiers. However, its write performance drops when workloads exceed the mirror tier, which is expected.

VM count	Pattern	Numjobs	IOdepth	IOPs	MiB/s	Latency (ms)	Node CPU usage %	IOPs per 1% CPU usage
20	4k random read	3	4	814,000	3,179	0.293	27.00%	15,074
	4k random read	3	8	739,000	2,886	0.642	26.00%	14,212
	4k random read	3	16	1,003,000	3,918	0.956	29.00%	17,293
	4k random read	3	32	1,556,000	6,078	1.232	42.00%	18,524
	4k random read	3	64	2,190,000	8,554	1.749	55.00%	19,909
	4k random read	3	128	2,500,000	9,766	3.068	68.00%	18,382
	4k random read/write (70%/30%)	3	2	126,200	492	1.245	27.00%	2,337
	4k random read/write (70%/30%)	3	4	108,200	422	2.442	23.00%	2,352
	4k random read/write (70%/30%)	3	8	49,800	195	9.766	10.00%	2,490
	4k random read/write (70%/30%)	3	16	225,800	882	5.690	37.00%	3,051
	4k random read/write (70%/30%)	3	32	247,000	965	11.251	38.00%	3,250
	4k random read/write (70%/30%)	3	64	231,400	903	25.634	33.00%	3,506
	4k random write	3	2	51,400	201	2.324	24.00%	1,071
	4k random write	3	4	58,600	229	4.094	26.00%	1,127
	4k random write	3	8	58,600	229	8.170	26.00%	1,127
	4k random write	3	16	74,800	292	12.868	29.00%	1,290
	4k random write	3	32	74,900	293	25.659	29.00%	1,291
	64k random read	3	2	316,000	19,750	0.378	18.00%
	64k random read	3	4	488,000	30,500	0.490	26.00%
	64k random read	3	8	377,000	23,560	1.296	22.00%
	64k random read	3	16	601,000	37,562	1.596	27.00%
	64k random read	3	32	832,000	52,000	2.307	38.00%
	64k random write	3	1	14,700	919	4.078	13.00%
	64k random write	3	2	15,200	950	7.883	17.00%
	64k random write	3	4	14,400	900	16.656	17.00%
	64k random write	3	8	14,600	913	32.938	17.00%
	64k random write	3	16	14,700	919	65.238	18.00%
	64k random write	3	32	14,400	900	132.694	18.00%
	1024k read	1	1	19,900	19,900	1.002	5.00%
	1024k read	1	2	31,600	31,600	1.230	8.00%
	1024k read	1	4	42,400	42,400	1.882	11.00%
	1024k read	1	8	47,600	47,600	3.363	13.00%
	1024k read	1	16	50,100	50,100	6.379	16.00%
	1024k write	1	1	1,482	1,482	13.496	4.00%
	1024k write	1	2	1,573	1,573	25.448	4.00%
	1024k write	1	4	2,295	2,295	34.817	5.00%
	1024k write	1	8	2,187	2,187	73.178	5.00%

Overall, S2D shows exceptional performance in both test cases.

Storage efficiency is about 35.7% and could be even less if additional space is assigned for in-place repairs.

Benchmarking results in graphs:

With all benchmarks completed and data collected, we can now compare the results using graphical charts for a clearer understanding.

4k random read:

Figure 1: 4K RR (IOPS)

Let’s start with the 4K random read test, where Figure 1 showcases the performance in IOPS. The S2D in the ‘Mirror-accelerated parity’ configuration with 100% of the workload in the mirror tier reaches a remarkable 833,000 IOPS at 4 I/O queue depth, scaling up to 2,653,000 IOPS at 128 I/O queue depth.

Comparatively, the StarWind VSAN peaks at 881,000 IOPS at 128 I/O queue depth. Here, S2D outshines StarWind VSAN with approximately 200% more IOPS at higher depths.

So, what’s the magic behind S2D’s performance? It’s all about local reading. In a cluster shared volume (CSV) setup, S2D leverages the SMB 3.0 protocol to allow multiple hosts to access and perform I/O operations on a shared volume (if you want to explore this topic in more detail, please read here or check this article). If a VM is running on the node that owns the volume, it can read data directly from the local disk, bypassing the network stack. This local read path minimizes latency and maximizes performance, leading to impressive IOPS numbers.

However, there’s a catch. This local reading perk only works if the VM is on the volume-owning node. If not, the read operations have to go through the network to the owning node, which can and will slow things down! To keep things running smoothly, you need to keep an eye on where your VMs are running and move them to the appropriate nodes as necessary. It’s tricky!

Figure 2: 4K RR (Latency)

When it comes to random read latency, Figure 2 reveals that Storage Spaces Direct with 100% of the workload within the mirror tier also excels, starting at a low 0.286 ms at 4 I/O queue depth and increasing to 2.897 ms at 128 I/O queue depth. StarWind VSAN starts at 0.570 ms, reaching 8.827 ms at the same depth.

Even with the workload split between the mirror and parity tiers, S2D maintains superior read latency, starting at 0.293 ms and peaking at 3.068 ms. The latency advantage of S2D is again attributed to local reads.

Figure 3: 4K RR (IOPS per 1% CPU Usage)

Switching gears to efficiency, Figure 3 compares IOPS per 1% CPU usage during a 4k random read test. Storage Spaces Direct with 100% of the workload in mirror tier proves highly efficient, delivering up to 21,704 IOPS per 1% CPU usage at 64 I/O queue depth, whereas StarWind VSAN peaks at 7,342 IOPS per 1% CPU usage at 128 I/O queue depth. This makes S2D approximately 196% more efficient.

Even when the workload spans both S2D tiers, mirror and parity, it maintains a strong efficiency advantage, reaching 19,909 IOPS per 1% CPU usage at 64 I/O queue depth.

4k random read/write 70/30:

Figure 4: 4K RR/RW 70%/30% (IOPS)

Next, let’s dive into mixed 70/30 read-write patterns. Figure 4 is key for understanding real-world performance because pure read or write workloads are rare in actual production.

Figure 4 shows the number of IOPS during the mixed 70%/30% 4k random read/write tests with Numjobs = 3.

Interestingly, with Storage Spaces Direct, there’s a noticeable drop in performance at queue depths 4 and 8. This performance drop is not observed in StarWind VSAN tests. StarWind maintains consistent performance, hitting 334,000 IOPS at queue depth 4 and 301,200 IOPS at queue depth 8.

In contrast, with 100% of the workload in the mirror tier, S2D’s performance drops to 114,600 IOPS at queue depth 4 and 62,800 IOPS at queue depth 8, representing reductions of approximately 65.7% and 79.1%, respectively, compared to StarWind VSAN. Fortunately, S2D shows a significant rebound in performance starting at QD=16, ultimately scoring about 25% higher than StarWind VSAN under the same conditions.

However, when the workload is distributed across both mirror and parity tiers, S2D struggles due to ReFS continuously moving new data from the mirror tier to the parity tier, which negatively affects performance. As a result, S2D records 126,200 IOPS at queue depth 2, drops to a low of 49,800 IOPS at QD=8, and then peaks at 247,000 IOPS at queue depth 32.

Meanwhile, StarWind VSAN outperforms S2D, achieving 241,500 IOPS at queue depth 2 and 534,000 IOPS at queue depth 32, highlighting its superior performance with IOPS figures that are 91.4% higher at queue depth 2 and 116.2% higher at queue depth 32 compared to S2D in the “Mirror+Parity” scenario.

Figure 5: 4K RR/RW 70%/30% (Latency)

Figure 5 examines latency for the mixed 4K random 70/30 workload. Storage Spaces Direct with 100% of the workload in the mirror tier starts at 0.382 ms at 2 I/O queue depth, reaching 6.899 ms at 64 I/O queue depth. StarWind VSAN, on the other hand, starts at 0.582 ms and goes up to 7.768 ms.

Figure 6: 4K RR/RW 70%/30% (IOPS per 1% CPU Usage)

Figure 6 explores the number of IOPS relative to 1% CPU utilization during the same mixed workload.

Storage Spaces Direct with workload within the mirror tier provides up to 10,180 IOPS per 1% CPU usage at 16 I/O queue depth, while StarWind VSAN peaks at 5,394 IOPS per 1% CPU usage. This makes S2D about 89% more efficient.

When workload is touching both tiers, mirror and parity, S2D achieves a maximum of 3,506 IOPS per 1% CPU usage, demonstrating 69.7% less performance compared to the StarWind VSAN at 64 I/Os queue depth.

4k random write:

Figure 7: 4K RW (IOPS)

The ability to maintain consistent write performance across various queue depths is crucial for demanding virtualization environments. Figure 7 shows the amount of IOPS during 4k random write operations.

StarWind VSAN stands out with consistent performance across most queue depths, significantly outperforming S2D (100% of the workload in the mirror tier) from I/O queue depth 2 to 16.

With S2D and 100% of the workload in the mirror tier, there’s a big drop in performance at queue depths 4 and 8. At queue depth 4, Storage Spaces Direct score about 46,800 IOPS which is more than 4 times lower than StarWind VSAN’s 192,000 IOPS figure. At queue depth 8, the gap widens even more and StarWind VSAN ends up being 584% more effective. This result is unexpected, as we anticipated better performance from S2D in this test compared to when the workload spans both mirror and parity tiers.

Interestingly, at queue depth 32 StarWind VSAN loses the advantage scoring 167,000 IOPS, while S2D with 100% of the data in the mirror tier gets traction achieving 186,000 IOPS. That’s being said, when workload is hitting both tiers, S2D is unable to show better performance figures and ends up scoring 74,900 IOPS.

Figure 8: 4K RW (Latency)

Latency for 4K random writes is also a critical factor. Since latency corresponds to prior IOPS results, the overall picture remains consistent in Figure 8.

StarWind VSAN demonstrates the lowest latency, starting at 0.859 ms with a 2 I/O queue depth and increasing to 3.689 ms at a 16 I/O queue depth. Virtual SAN significantly outperforms Storage Spaces Direct with 100% of the workload in the mirror tier, which starts at 1.499 ms (43% higher latency) and rises to 14.876 ms (75% higher latency) in QD=16 test.

When comparing StarWind to S2D with workload within both tiers, mirror and parity, the performance gap is even more noticable, with StarWind showing 63% lower latency at 2 I/O queue depth (0.859 ms vs. 2.324 ms) and 55% lower latency at 32 I/O queue depth (11.476 ms vs. 25.659 ms). Only at 32 I/O queue depth, StarWind VSAN demonstrates slightly higher latency, reaching 11.476 ms compared to S2D’s 10.174 ms, which is about 11% lower.

Figure 9: 4K RW (IOPS per 1% CPU Usage)

Efficiency in 4K random write workloads is measured in IOPS per 1% CPU usage, as shown in Figure 9.

Storage Spaces Direct with 100% of the workload in the mirror tier achieves up to 5,167 IOPS per 1% CPU usage, while StarWind VSAN peaks at 3,093 IOPS per 1% CPU usage, making S2D approximately 67% more efficient. However, when the workload utilizes both S2D tiers, mirror and parity, efficiency drops significantly, with a maximum of only 1,291 IOPS per 1% CPU usage, making it the least efficient of the three scenarios.

64k random read:

Figure 10: 64K RR (Throughput)

Moving to larger data blocks, Figure 10 illustrates the throughput performance for 64K random reads.

Storage Spaces Direct with 100% of the workload in the mirror tier significantly outpaces StarWind VSAN, achieving a peak of 53,500 MiB/s at 32 I/O queue depth compared to StarWind’s 14,562 MiB/s. This indicates that S2D delivers approximately 267% more throughput.

When the workload utilizes both S2D tiers, mirror and parity, it shows slightly lower throughput but still surpasses StarWind VSAN significantly. The higher performance in S2D is attributed to local reads, but remember, this efficiency is conditional on the VM running on the node that owns the volume. StarWind VSAN, in contrast, provides stable performance regardless of VM placement, eliminating the need for additional monitoring and VM-to-host binding.

Figure 11: 64K RR (Latency)

Figure 11 shows the latency for 64K random reads. The results align with the throughput data discussed earlier.

Here, S2D with 100% of the workload in the mirror tier maintains low latency due to local reads, starting at 0.376 ms and reaching 2.243 ms at 32 I/O queue depth. The StarWind VSAN starts higher at 0.749 ms and peaks at 8.343 ms, which is up to 73% higher latency than S2D.

Figure 12: 64K RR (CPU Usage)

In Figure 12, we examine CPU usage during 64K random reads.

Storage Spaces Direct with 100% of the workload in the mirror tier starts at 17% CPU usage at 2 I/O queue depth and peaks at 38% at 32 I/O queue depth. When the workload hits both tiers, mirror and parity, S2D shows consistent CPU usage trends, closely following the S2D “mirror-only” test results.

StarWind VSAN begins significantly higher at 35% and peaks at 41%, slightly above S2D. This indicates that S2D is more efficient in CPU usage, with StarWind VSAN using approximately 106% more CPU at I/O queue depths 2 to 16 and about 8% more at I/O queue depth of 32.

64k random write:

Figure 13: 64K RW (Throughput)

Figure 13 illustrates the 64K random write throughput, highlighting performance differences across three scenarios.

Storage Spaces Direct with 100% of the workload in the mirror tier exhibits erratic performance, with notable drops at medium I/O queue depths. For example, throughput falls to 2,019 MiB/s at a 4 I/O queue depth, dips further to 1,457 MiB/s at 8 I/O queue depth, and then rebounds to 2,616 MiB/s at 16 I/O queue depth. This pattern reflects the behavior observed in 4K random write tests. Not stable. No good!

In contrast, StarWind VSAN delivers more consistent performance, surpassing S2D by 79.6% at a 4 I/O queue depth, by 167.6% at 8 I/O queue depth, and by 52% at 16 I/O queue depth.

When workloads span both tiers, mirror and parity, S2D shows significantly lower throughput across the board. StarWind VSAN outperforms S2D by 199% at a 1 I/O queue depth, with the performance gap widening to 343% at a 32 I/O queue depth.

This highlights StarWind’s capability to handle write operations with consistently high performance across varying I/O queue depths.

Figure 14: 64K RW (Latency)

Figure 14 displays the latency for 64K random writes, showing a similar trend.

StarWind VSAN delivers faster response times at lower I/O queue depths (4, 8, and 16), but S2D (with 100% of the workload in the mirror tier) takes the lead at a 32 I/O queue depth, achieving a lower latency of 22.138 ms compared to StarWind’s 30.150 ms.

When compared to S2D’s configuration with workloads spread across both tiers, mirror and parity, StarWind VSAN is significantly more efficient, providing 66.9% faster response times at a 1 I/O queue depth and 77.3% lower latency at a 32 I/O queue depth.

Figure 15: 64K RW (CPU usage)

Figure 15 highlights CPU usage during 64K random writes.

StarWind VSAN consistently shows higher CPU utilization compared to both Storage Spaces Direct configurations (Mirror-only and Mirror+Parity). At a 1 I/O queue depth, StarWind uses 25% CPU, which is 79% higher than S2D’s 14% with 100% of the workload in the mirror tier and 92% higher than S2D’s 13% in the mixed-tier test. This trend persists across different I/O queue depths, with StarWind maintaining higher CPU usage but delivering more consistent performance under varying workloads.

1M read:

Figure 16: 1024K R (Throughput)

Figure 16 presents the throughput results for 1024K reads, where S2D with 100% of the workload in the mirror tier significantly outperforms StarWind VSAN, reaching 52,000 MiB/s at a 16 I/O queue depth compared to StarWind’s 15,600 MiB/s — about 233% higher throughput.

Even when workloads are spread across both tiers, mirror and parity, S2D continues to outperform StarWind VSAN by a substantial margin. This impressive read performance from S2D is again due to local reads when the test VM is located on the CSV owner node.

Figure 17: 1024K R (Latency)

Figure 17 shows the latency results during the 1024K read test, reflecting a pattern similar to the throughput results.

S2D with 100% of the workload in the mirror tier demonstrates impressively low latency, benefiting from local reads. Latency starts at 1.002 ms and increases to 6.098 ms as I/O queue depth grows.

In contrast, StarWind VSAN starts at 1.998 ms and peaks at 20.625 ms, resulting in S2D delivering up to 240% lower latency than StarWind.

When workloads are distributed across both S2D tiers, mirror and parity, latency remains nearly identical to that of the mirror tier tests. It’s all thanks to the local reads!

Figure 18: 1024K R (CPU Usage)

Figure 18 highlights CPU usage during 1024K reads, where S2D demonstrates significantly lower resource consumption compared to StarWind VSAN.

With workloads in the mirror tier, S2D starts at 5% CPU usage at a 1 I/O queue depth and increases to 16% at a 16 I/O queue depth.

In contrast, StarWind VSAN begins at 26% and rises to 33%, meaning S2D uses about 63% less CPU on average.

Even when workloads span both tiers, mirror and parity, S2D maintains the same CPU usage levels as in the mirror-only benchmarks. S2D’s efficiency in local reads translates to more effective CPU usage, while StarWind requires more resources to sustain consistent performance.

1M write:

Figure 19: 1024K W (Throughput)

When we shift our focus to 1024K sequential write throughput, Figure 19 reveals that Storage Spaces Direct (S2D) with 100% of the workload in the mirror tier holds a significant performance advantage over StarWind VSAN. Specifically, S2D reaches 8,693 MiB/s at a 2 I/O queue depth, while StarWind VSAN manages 3,903 MiB/s. At an 8 I/O queue depth, S2D continues to dominate, hitting 8,559 MiB/s, compared to StarWind’s 4,156 MiB/s. This means S2D delivers approximately 122% higher throughput, if your workload fits into the mirror tier…

However, it’s important to note that this advantage is conditional. If the workload doesn’t fit into the mirror tier, S2D’s performance drops dramatically. This is evident in the multi-tiered test results, where StarWind VSAN outperforms Storage Spaces Direct by about 106% on average.

Figure 20: 1024K W (Latency)

Diving into the 1024K write latency as depicted in Figure 20, we see a consistent theme.

With 100% of its workload in the mirror tier, Storage Spaces Direct begins at a brisk 2.400 ms and climbs to 18.684 ms at 8 I/O queue depth. In comparison, StarWind VSAN starts at a slower 5.804 ms and escalates to a higher peak of 38.492 ms, which demonstrates that S2D provides up to 106% lower latency.

However, when the workload spans both the mirror and parity tiers, everything gets flipped upside down. S2D records the highest latencies, starting at 13.496 ms and surging to 73.178 ms at 8 I/O queue depth. This again indicates a significant performance shift depending on how well the workload is aligned with S2D’s optimal tier configuration.

Figure 21: 1024K W (CPU Usage)

Figure 21 highlights CPU usage during 1024K writes. With 100% of the workload in the mirror tier, Storage Spaces Direct (S2D) begins at 8% CPU usage at a queue depth of 1 I/O and consistently holds at 9% across queue depths of 2, 4, and 8 I/O.

In contrast, StarWind VSAN begins at a much higher 24% and remains steady around 25% across all I/O queue depths. This indicates that S2D consumes approximately 72% less CPU on average, demonstrating significantly more efficient resource utilization compared to StarWind VSAN.

When the workload spans both tiers of S2D, mirror and parity, it continues to exhibit even lower CPU usage, starting at just 4% at 1 and 2 I/O queue depths and modestly rising to 5% at 4 and 8 I/O queue depths.

Additional benchmarking: 1 VM, 1 numjobs, 1 iodepth

To gain a deeper understanding of how StarWind VSAN and Storage Spaces Direct perform under specific synthetic conditions, we conducted additional benchmarks focusing on a single VM scenario, with numjobs = 1 and an I/O queue depth of 1. Typically, this is the best way to measure storage access latency.

Benchmark results in a table:

StarWind VSAN, Host mirroring + RAID5, 1 VM
Pattern	Numjobs		IOdepth		IOPs		MiB\s		Latency (ms)
4k random read	1		1		1,112		4		0.897
4k random write	1		1		501		2		1.991
4k random write (synchronous)	1		1		226		1		4.415
Storage Spaces Direct (TCP) – Nested mirror accelerated parity – Data in mirror tier (1 VM)
Pattern		Numjobs		IOdepth		IOPs		MiB\s		Latency (ms)
4k random read		1		1		7,221		28		0.137
4k random write		1		1		5,456		21		0.182
4k random write (synchronous)		1		1		2,887		11		0.344

Storage Spaces Direct (TCP) – Nested mirror accelerated parity – Data in mirror and parity tiers (1 VM)
Pattern	Numjobs	IOdepth	IOPs	MiB\s	Latency (ms)
4k random read	1	1	5,920	23	0.167
4k random write	1	1	2,517	10	0.395
4k random write (synchronous)	1	1	1,772	7	0.562

Benchmark results in graphs:

This section presents visual comparisons of the performance and latency metrics across storage configurations under research.

4k random read:

Figure 1: 4K RR (IOPS)

Figure 1 demonstrates IOPS for the 4K random read test at 1 I/O queue depth and with one numjobs. S2D with 100% of the workload in mirror tier outperforms the StarWind VSAN delivering 7,221 IOPS.

This remarkable 550% increase over StarWind’s 1,112 IOPS is primarily due to S2D’s ability to leverage local reads and operate entirely at the host level. In contrast, StarWind VSAN, running inside a VM and mixing local and network I/O, usually faces a much longer data path, which impacts its performance negatively.

Even when S2D operates with data across both mirror and parity tiers, it maintains strong performance at 5,920 IOPS, still surpassing StarWind by 432%.

Figure 2: 4K RR (Latency)

Latency metrics for the 4K random read test at 1 I/O queue depth, as shown in Figure 2, similarly favor Storage Spaces Direct with 100% of the workload in the mirror tier, which records a swift 0.137 ms. This is a substantial 553% lower latency compared to StarWind VSAN’s 0.897 ms. This advantage is again due to S2D’s local read capabilities and direct host-level operation. Even in a mixed-tier setup, mirror and parity, S2D maintains its lead with a latency of 0.167 ms, still outperforming StarWind by 437%.

Figure 3: 4K RW (IOPS)

Figure 3 showcases the results of the 4K random write test at I/O queue depth=1 with a numjob=1. Storage Spaces Direct with 100% of the workload in the mirror tier achieves a remarkable 5,456 IOPS, an astounding 990% higher than StarWind VSAN’s 501 IOPS. This significant advantage stems from S2D’s ability to write directly to the mirror tier, bypassing the resource-intensive parity calculation and avoiding read-modify-write.

However, when S2D handles workloads across both mirror and parity tiers, performance drops to 2,517 IOPS due to the additional overhead of invalidating data in the parity tier. For a deeper dive into how reading and writing function in a mirror-accelerated parity scenario, please refer to the detailed explanation provided here.

On the other hand, StarWind VSAN, which writes directly to the RAID5 virtual LUN, experiences performance degradation due to the read-modify-write (RMW) operations and the extended I/O data path inherent in its VM-based operation. Despite these technical challenges, the performance of StarWind VSAN at queue depth of 1 appears unusually low, prompting us to initiate an investigation into this issue to uncover the underlying cause.

Figure 4: 4K RW (Latency)

Moving on to Figure 4, we examine the latency metrics for 4K random writes.

No surprises here. Storage Spaces Direct (S2D) continues to deliver superior performance, benefiting from its efficient data handling within the mirror tier and achieving an impressively low latency of 0.182 ms. This represents a staggering 995% improvement over StarWind VSAN’s 1.991 ms.

Even when S2D operates across both mirror and parity tiers, it maintains a competitive latency of 0.395 ms, still outperforming StarWind by 404%.

Figure 5: 4K RW Synchronous (IOPS)

In our synchronous 4K read-write single-threaded I/O tests, as shown in Figure 5, Storage Spaces Direct with the dataset entirely in the mirror tier once again takes the lead, achieving 2,887 IOPS — a staggering 1,177% increase over StarWind VSAN’s 226 IOPS.

This significant performance boost is attributed to the same factors observed in asynchronous 4K random write test, where S2D benefits from direct writes to the mirror tier, effectively bypassing the resource-heavy parity calculations and avoiding read-modify-write.

Even in the “mixed tiers” setup, S2D maintains a strong advantage, delivering 1,772 IOPS and still outpacing StarWind by 684%.

Figure 6: 4K RW Synchronous (Latency)

Figure 6 highlights the latency results for synchronous 4K read-write single-threaded I/O, further confirming S2D’s performance edge.

With 100% of the workload within the mirror tear, Storage Spaces Direct achieves the write latency of 0.344 ms — an impressive 1,184% lower than StarWind VSAN’s 4.415 ms.

Even when using the mirror and parity tiers, S2D maintains a strong latency advantage at 0.562 ms, outpacing StarWind by 686%. This superior performance stems from S2D’s efficient data handling, consistently delivering lower latency across varying configurations.

Conclusion

To sum it up, both Storage Spaces Direct and StarWind VSAN come with their own set of perks and trade-offs for your IT infrastructure.

Storage Spaces Direct shines in read performance, particularly when virtual machines are aligned with CSV owner nodes. However, we observed some unexpected performance issues during 4K and 64K random-write tests, where S2D sometimes underperformed with data in the mirror tier compared to when it spanned both mirror and parity tiers. This highlights the need for careful monitoring and VM data placing to ensure optimal performance. Mismanagement of workloads can lead to significant performance drops, particularly at certain queue depths. Additionally, S2D requires extra space for fault tolerance, which can impact overall storage efficiency.

On the other hand, StarWind VSAN proves to be a solid choice for high-performance environments, especially with mixed read/write or write-heavy workloads. It consistently delivers superior write performance under load, regardless of VM placement, and offers better capacity efficiency. However, StarWind VSAN lacks the local read boost that S2D provides, can be more demanding on CPU resources, and showed some anomalies in single-threaded tests.

So, if you’re looking for exceptional read performance and don’t mind keeping a close eye on your workloads, S2D is a great option. But if you’re after consistent write and mixed I/O performance with better capacity efficiency, all in a rugged ‘fire-and-forge’ mode, StarWind VSAN is the way to go.

Stay tuned for our upcoming articles, where we’ll dive deeper into these solutions to give you a bigger picture of how they can fit into your IT strategy.

StarWind Virtual SAN (VSAN) vs Microsoft Storage Spaces Direct (S2D), Part 1: Hyper-V HCI Performance Benchmarking (TCP)

Introduction

High-scope interconnect diagram: StarWind VSAN, NVMe-oF, TCP

High-scope interconnect diagram: Microsoft S2D (‘Mirror-accelerated parity’), TCP

Capacity efficiency

When planning your own S2D deployment, keep these factors in mind to make sure you get the best performance and storage efficiency for your needs!

Testbed Overview:

Testing methodology:

Test Scenarios:

Data Patterns Tested:

Pre-Test Warm-Up:

Test Execution:

Specific Configurations:

Benchmarking local NVMe performance:

Benchmark results in a table:

StarWind Virtual SAN (VSAN)

Microsoft Storage Spaces Direct (‘Mirror-accelerated parity’): Mirror-only

Microsoft Storage Spaces Direct (‘Mirror-accelerated parity’): Mirror+Parity

Benchmarking results in graphs:

4k random read:

4k random read/write 70/30:

64k random read:

64k random write:

1M read:

1M write:

Additional benchmarking: 1 VM, 1 numjobs, 1 iodepth

Benchmark results in a table:

Benchmark results in graphs:

4k random read:

Conclusion