StarWind VSAN vs Microsoft S2D: Performance Comparison

Introduction

Choosing the right data storage solution can make or break your system’s performance and reliability. If you’re working in a Hyper-V environment, you’ve probably heard of StarWind Virtual SAN (VSAN) and Microsoft Storage Spaces Direct (S2D). But which one should you go for? In this article, we dive deep into these two solutions, comparing their performance, capacity efficiency, and practical application in a 2-node Hyper-V cluster setup. By the end, you’ll have a clearer picture of which solution might be your perfect match.

To compare these two solutions fairly, we set up a 2-node Hyperconverged Infrastructure (HCI) Hyper-V cluster under two different configurations:

StarWind VSAN NVMe-oF over TCP

Host Mirroring + MDRAID-5.

Microsoft Storage Spaces Direct over TCP

Nested mirror-accelerated parity, workload placed in the mirror tier.
Nested mirror-accelerated parity, workload placed in both tiers – mirror and parity.

Solutions overview

StarWind VSAN for Hyper-V NVMe-oF over TCP scenario:

In this setup, each Hyper-V node is equipped with 5x NVMe drives passed through to the StarWind Controller Virtual Machine (CVM). Inside the CVM, the drives are assembled into an MDRAID5 array. On top of this array, two StarWind High Availability (HA) devices are created, ensuring data replication and continuous availability. StarWind NVMe-oF Initiator, chosen due to the lack of a native Microsoft NVMe-oF initiator (Microsoft is expected to introduce support for NVMe-oF in Windows Server 2025 but with TCP support only), connects these devices to the nodes. Cluster Shared Volumes (CSVs) are then created on these connected devices.

Microsoft Storage Spaces Direct over TCP scenario – Nested mirror-accelerated parity:

DISCLAIMER: We’re aware that disk witness isn’t officially supported with S2D. However, for the sake of our benchmarking and to speed up deployment, we chose to proceed with it. That said, do not use disk witness in your production S2D cluster.

This scenario tested two configurations of S2D, focusing on a Nested mirror-accelerated parity that provides the optimal balance between performance and capacity efficiency:

Workload placed in the mirror tier: Maximizes performance by keeping data in the faster mirror tier.
Workload placed in both tiers: Simulates a more balanced scenario where data moves between the mirror and parity tiers, reflecting real-world conditions (when the workload does not fit in the mirror tier and Resilient File System (ReFS) begins to move data to the parity tier). We also tried to achieve a behavior where writes were sent directly to the parity tier – the worst-case scenario.

In reality, with production workloads, the performance will likely fall somewhere between these two cases.

Two storage tiers are created with different resiliency settings – Mirror for performance and Parity for capacity – with the following parameters:

New-StorageTier -StoragePoolFriendlyName s2d-pool -FriendlyName NestedPerformance -ResiliencySettingName Mirror -MediaType SSD -NumberOfDataCopies 4

New-StorageTier -StoragePoolFriendlyName s2d-pool -FriendlyName NestedCapacity -ResiliencySettingName Parity -MediaType SSD -NumberOfDataCopies 2 -PhysicalDiskRedundancy 1 -NumberOfGroups 1 -FaultDomainAwareness StorageScaleUnit -ColumnIsolation PhysicalDisk -NumberOfColumns 4

New-StorageTier -StoragePoolFriendlyName s2d-pool -FriendlyName NestedPerformance -ResiliencySettingName Mirror -MediaType SSD -NumberOfDataCopies 4

New-StorageTier -StoragePoolFriendlyName s2d-pool -FriendlyName NestedCapacity -ResiliencySettingName Parity -MediaType SSD -NumberOfDataCopies 2 -PhysicalDiskRedundancy 1 -NumberOfGroups 1 -FaultDomainAwareness StorageScaleUnit -ColumnIsolation PhysicalDisk -NumberOfColumns 4

Volumes are allocated with 20% in the mirror tier and 80% in the parity tier, adhering to Microsoft’s recommendations:

New-Volume -StoragePoolFriendlyName s2d-pool -FriendlyName Volume01 -StorageTierFriendlyNames NestedPerformance, NestedCapacity -StorageTierSizes 820GB, 3276GB

New-Volume -StoragePoolFriendlyName s2d-pool -FriendlyName Volume02 -StorageTierFriendlyNames NestedPerformance, NestedCapacity -StorageTierSizes 820GB, 3276GB

New-Volume -StoragePoolFriendlyName s2d-pool -FriendlyName Volume01 -StorageTierFriendlyNames NestedPerformance, NestedCapacity -StorageTierSizes 820GB, 3276GB

New-Volume -StoragePoolFriendlyName s2d-pool -FriendlyName Volume02 -StorageTierFriendlyNames NestedPerformance, NestedCapacity -StorageTierSizes 820GB, 3276GB

ReFS manages the data movement between these tiers to optimize performance. The threshold value, at which ReFS starts moving data between the tiers, was left at the default – 85%.

Capacity efficiency

Capacity efficiency is a big deal when evaluating storage solutions:

StarWind VSAN for Hyper-V NVMe-oF
Achieves a capacity efficiency of 40%, thanks to its combination of host mirroring and MDRAID-5.
Microsoft S2D Nested mirror-accelerated parity
Delivers a capacity efficiency of 35.7% (20% mirror, 80% parity), though this can vary depending on the percentage of the volume allocated to the mirror tier. For more details on how to calculate capacity efficiency for Nested mirror-accelerated parity, please refer to the provided link.

Microsoft recommends leaving some capacity in the storage pool unallocated to give volumes space to repair “in-place” after drive failure. If sufficient capacity exists, an immediate, in-place, parallel repair can restore volumes to full resiliency even before the failed drives are replaced. This happens automatically. So, in our setup, the recommended reserve space is 5.82 TB (20% of the total pool size):

When planning your solution, consider these factors to ensure you get the best performance and efficiency for your needs.

Testbed overview

When it comes to evaluating storage solutions like StarWind VSAN for Hyper-V NVMe-oF over TCP and Microsoft S2D over TCP, we didn’t cut any corners. Our testbed was robust and meticulously configured to simulate real-world environments, ensuring our findings are relevant and reliable. Here’s a breakdown of the hardware and software setups that powered our tests:

Hardware:

Server model	Supermicro SYS-220U-TNR
CPU	Intel(R) Xeon(R) Platinum 8352Y @2.2GHz
Sockets	2
Cores/Threads	64/128
RAM	256GB
NICs	2x Mellanox ConnectX®-6 EN 200GbE (MCX613106A-VDA)
Storage	5x NVMe Micron 7450 MAX: U.3 3.2TB

Software:

Windows Server	Windows Server 2022 Datacenter 21H2 OS build 20348.2527
StarWind VSAN	Version V8 (build 15469, CVM 20240530) (kernel – 5.15.0-113-generic)
StarWind NVMe-oF Initiator	StarWind NVMe-oF Initiator.2.0.0.672(rev 674).Setup.486

StarWind CVM parameters:

CPU	24 vCPU
RAM	32GB
NICs	1x network adapter for management 4x network adapter for client IO and synchronization
Storage	MDRAID5 (5x NVMe Micron 7450 MAX: U.3 3.2TB)

Testing methodology

The benchmarks were conducted using the FIO utility in the client/server mode. We configured a total of 20 virtual machines (VMs), with 10 VMs hosted on each server node. Each VM was allocated 4 vCPUs, 8GB of RAM, and three RAW virtual disks connected to separate SCSI controllers.

Test Scenarios:

Microsoft Storage Spaces Direct (S2D)

Nested mirror-accelerated parity (Mirror-only): For scenarios where the workload is placed entirely in the mirror tier, each virtual disk size was 10GB.
Nested mirror-accelerated parity (Both tiers): For scenarios utilizing both the mirror and parity tiers, each virtual disk size was 100GB.

StarWind VSAN NVMe-oF

For all tests, each virtual disk size was 100GB.

Data Patterns Tested:

4k random read
4k random read/write (70/30)
4k random write
64k random read
64k random write
1M read
1M write

Pre-Test Warm-Up:

Before running specific tests, we filled virtual disks with random data and warmed up them using corresponding patterns to ensure stable performance:

4k random read/write (70/30) and 4k random write: VM disks were warmed up with a 4k random write pattern for 4 hours.
64k random write: VM disks were warmed up with a 64k random write pattern for 2 hours.

Test Execution:

Duration: Read tests were conducted for 600 seconds, and write tests lasted 1800 seconds.
Repetition: All tests were repeated three times, and the average value was used as the final result.

Specific Configurations:

Microsoft Storage Spaces Direct (S2D)
Following Microsoft’s recommendations for the S2D scenario, test VMs were placed on the CSV owner node to avoid redirecting requests to another node, ensuring local data reads without using the network stack and providing less network utilization on writes. Each VHDX file was placed in different subdirectories to optimize ReFS metadata operations and reduce latency.
StarWind VSAN for Hyper-V NVMe-oF
VMs were evenly distributed across hosts without being pinned to the node that owns the volume. Each VHDX file was placed in different subdirectories to maintain consistent performance.

Benchmarking local NVMe performance

Before diving into the full evaluation, we checked if the NVMe drives lived up to their vendor’s promises, so we ran a series of tests to see if its performance matched up. Here is the image with vendor-claimed performance:

Using the FIO utility in client/server mode, we checked how well the NVMe SSDs in our server performed in a local storage setup. Our local storage tests used different patterns to see how the NVMe SSDs handled different kinds of data. The following results have been achieved:

	1x NVMe Micron 7450 MAX: U.3 3.2TB
Pattern	Numjobs	IOdepth	IOPs	MiB\s	Latency (ms)
4k random read	6	32	997,000	3,894	0.192
4k random read/write 70/30	6	16	531,000	2,073	0.142
4k random write	4	4	385,000	1,505	0.041
64k random read	8	8	92,900	5,807	0.688
64k random write	2	1	27,600	1,724	0.072
1M read	1	8	6,663	6,663	1.200
1M write	1	2	5,134	5,134	0.389

Our tests showed that the NVMe drives lived up to what the vendor promised. Whether handling small 4k reads or large 1M writes, they delivered on speed and consistency.

Benchmark results in a table

The benchmarking results are presented in tables to illustrate performance metrics such as IOPS, throughput (MiB/s), latency (ms), and CPU usage. An additional metric, “IOPS per 1% CPU usage,” highlights the performance dependency on the CPU usage for 4k random read/write patterns. This parameter is calculated using the following formula:

IOPS per 1% CPU usage = IOPS / Node count / Node CPU usage

Where:

IOPS represents the number of I/O operations per second for each pattern.
Node count is 2 nodes in our case.
Node CPU usage denotes the CPU usage of one node during the test.

By incorporating this additional metric, we aimed to provide deeper insights into how CPU usage correlates with IOPS, offering a more nuanced understanding of performance characteristics.

Now let’s delve into the detailed benchmark results for each storage configuration.

StarWind VSAN for Hyper-V NVMe-oF over TCP scenario

The table illustrates StarWind VSAN’s performance under various workload patterns and configurations. For 4k random reads, IOPS scored from 420,000 at lower queue depths to 881,000 at higher depths. In a mixed 4k random read/write (70/30) test, it achieves up to 561,000 IOPS, showcasing its prowess in handling mixed workloads.

In the 64k and 1M read/write patterns, the StarWind VSAN NVMe-oF reaches up to 15,2 GB/s, demonstrating its ability to handle such workloads effectively.

VM count	Pattern	Numjobs	IOdepth	IOPs	MiB/s	Latency (ms)	Node CPU usage %	IOPs per 1% CPU usage
20	4k random read	3	4	420,000	1,641	0.570	45.00%	4,667
	4k random read	3	8	307,000	1,200	1.515	35.00%	4,386
	4k random read	3	16	546,000	2,134	1.736	50.00%	5,460
	4k random read	3	32	741,000	2,895	2.586	57.00%	6,500
	4k random read	3	64	836,000	3,265	4.567	58.00%	7,207
	4k random read	3	128	881,000	3,442	8.827	60.00%	7,342
	4k random read/write (70%/30%)	3	2	241,500	943	0.582	39.00%	3,096
	4k random read/write (70%/30%)	3	4	334,000	1,305	0.843	45.00%	3,711
	4k random read/write (70%/30%)	3	8	301,200	1,177	1.683	42.00%	3,586
	4k random read/write (70%/30%)	3	16	416,000	1,625	2.507	48.00%	4,333
	4k random read/write (70%/30%)	3	32	534,000	2,086	4.002	53.00%	5,038
	4k random read/write (70%/30%)	3	64	561,000	2,191	7.768	52.00%	5,394
	4k random write	3	2	139,000	541	0.859	33.00%	2,106
	4k random write	3	4	192,000	751	1.246	39.00%	2,462
	4k random write	3	8	238,000	928	2.018	44.00%	2,705
	4k random write	3	16	260,000	1,015	3.689	44.00%	2,955
	4k random write	3	32	167,000	653	11.476	27.00%	3,093
	64k random read	3	2	160,000	10,000	0.749	35.00%
	64k random read	3	4	200,000	12,500	1.205	39.00%
	64k random read	3	8	210,000	13,125	2.299	40.00%
	64k random read	3	16	228,000	14,250	4.203	41.00%
	64k random read	3	32	233,000	14,562	8.343	41.00%
	64k random write	3	1	44,000	2,751	1.350	25.00%
	64k random write	3	2	51,900	3,242	2.311	27.00%
	64k random write	3	4	58,300	3,645	4.108	28.00%
	64k random write	3	8	62,400	3,900	7.689	29.00%
	64k random write	3	16	63,600	3,975	12.070	29.00%
	64k random write	3	32	63,800	3,987	30.150	29.00%
	1024k read	1	1	10,000	10,000	1.998	26.00%
	1024k read	1	2	12,400	12,400	3.225	29.00%
	1024k read	1	4	14,100	14,100	5.668	31.00%
	1024k read	1	8	15,200	15,200	10.574	32.00%
	1024k read	1	16	15,600	15,600	20.625	33.00%
	1024k write	1	1	3,443	3,443	5.804	24.00%
	1024k write	1	2	3,903	3,903	10.241	25.00%
	1024k write	1	4	4,086	4,086	19.561	25.00%
	1024k write	1	8	4,156	4,156	38.492	25.00%

Overall, StarWind VSAN shows great performance at 4k random read/write patterns, consistent read and write performance regardless of VM location, and impressive capacity efficiency at 40%.

Microsoft Storage Spaces Direct over TCP scenario (Nested mirror-accelerated parity: Mirror-only)

The next table presents S2D’s performance with a Nested mirror-accelerated parity configuration, focusing on workloads in the mirror tier.

For 4k random read scenarios, IOPS peak at 2,653,000, showcasing exceptional read performance due to local reading. In the 4k random read/write (70/30) pattern, results reach up to 654,000 IOPS.

The 64k random read/write and 1M read/write tests maintain high throughput, with 53,5 GB/s for 64k reads and 52,4GB/s for 1M reads. S2D shows exceptional read performance when VMs are on the volume-owning node, and robust write performance within the mirror tier. However, read performance declines if local reading requirements aren’t met and unusual performance drops occur at certain queue depths. Additionally, write performance can drop if VMs are running on a node that is not the volume owner, necessitating careful monitoring.

VM count	Pattern	Numjobs	IOdepth	IOPs	MiB/s	Latency (ms)	Node CPU usage %	IOPs per 1% CPU usage
20	4k random read	3	4	833,000	3,256	0.286	27.00%	15,426
	4k random read	3	8	752,000	2,937	0.648	21.00%	17,905
	4k random read	3	16	1,083,000	4,230	0.884	29.00%	18,672
	4k random read	3	32	1,646,000	6,429	1.165	41.00%	20,073
	4k random read	3	64	2,344,000	9,158	1.637	54.00%	21,704
	4k random read	3	128	2,653,000	10,363	2.897	67.00%	19,799
	4k random read/write (70%/30%)	3	2	324,300	1,266	0.382	20.00%	8,108
	4k random read/write (70%/30%)	3	4	114,600	447	2.103	7.00%	8,186
	4k random read/write (70%/30%)	3	8	62,800	245	7.659	4.00%	7,850
	4k random read/write (70%/30%)	3	16	509,000	1,988	1.939	25.00%	10,180
	4k random read/write (70%/30%)	3	32	614,000	2,398	3.564	31.00%	9,903
	4k random read/write (70%/30%)	3	64	654,000	2,554	6.899	34.00%	9,618
	4k random write	3	2	80,300	314	1.499	9.00%	4,461
	4k random write	3	4	46,800	183	5.116	6.00%	3,900
	4k random write	3	8	34,800	136	13.788	4.00%	4,350
	4k random write	3	16	64,700	253	14.876	7.00%	4,621
	4k random write	3	32	186,000	728	10.174	18.00%	5,167
	64k random read	3	2	317,000	19,812	0.376	17.00%
	64k random read	3	4	498,000	31,125	0.478	25.00%
	64k random read	3	8	424,000	26,500	1.142	22.00%
	64k random read	3	16	623,000	38,937	1.539	27.00%
	64k random read	3	32	856,000	53,500	2.243	38.00%
	64k random write	3	1	85,700	5,355	0.693	14.00%
	64k random write	3	2	58,300	3,645	2.055	10.00%
	64k random write	3	4	32,300	2,019	7.435	5.00%
	64k random write	3	8	23,300	1,457	20.592	4.00%
	64k random write	3	16	41,800	2,616	22.939	6.00%
	64k random write	3	32	86,300	5,393	22.138	14.00%
	1024k read	1	1	19,900	19,900	1.002	5.00%
	1024k read	1	2	31,600	31,600	1.267	7.00%
	1024k read	1	4	43,700	43,700	1.825	11.00%
	1024k read	1	8	50,300	50,300	3.180	14.00%
	1024k read	1	16	52,400	52,400	6.098	16.00%
	1024k write	1	1	8,290	8,290	2.400	8.00%
	1024k write	1	2	8,693	8,693	4.614	9.00%
	1024k write	1	4	8,607	8,607	9.290	9.00%
	1024k write	1	8	8,559	8,559	18.684	9.00%

Microsoft Storage Spaces Direct over TCP scenario (Nested mirror-accelerated parity: Both Tiers)

The performance metrics for the dual-tier configuration in S2D highlight workload management across both mirror and parity tiers.

In 4k random read patterns, IOPS reach up to 2,500,000, showcasing excellent scalability. The 4k random read/write (70/30) pattern results show up to 247,000 IOPS.

For 64k random read/write and 1M read/write tests, the system maintains strong throughput, with 52,000 MiB/s for 64k reads and 50,100 MiB/s for 1M reads, demonstrating S2D’s robust capability to handle complex data operations across tiers. However, its write performance drops when workloads exceed the mirror tier.

VM count	Pattern	Numjobs	IOdepth	IOPs	MiB/s	Latency (ms)	Node CPU usage %	IOPs per 1% CPU usage
20	4k random read	3	4	814,000	3,179	0.293	27.00%	15,074
	4k random read	3	8	739,000	2,886	0.642	26.00%	14,212
	4k random read	3	16	1,003,000	3,918	0.956	29.00%	17,293
	4k random read	3	32	1,556,000	6,078	1.232	42.00%	18,524
	4k random read	3	64	2,190,000	8,554	1.749	55.00%	19,909
	4k random read	3	128	2,500,000	9,766	3.068	68.00%	18,382
	4k random read/write (70%/30%)	3	2	126,200	492	1.245	27.00%	2,337
	4k random read/write (70%/30%)	3	4	108,200	422	2.442	23.00%	2,352
	4k random read/write (70%/30%)	3	8	49,800	195	9.766	10.00%	2,490
	4k random read/write (70%/30%)	3	16	225,800	882	5.690	37.00%	3,051
	4k random read/write (70%/30%)	3	32	247,000	965	11.251	38.00%	3,250
	4k random read/write (70%/30%)	3	64	231,400	903	25.634	33.00%	3,506
	4k random write	3	2	51,400	201	2.324	24.00%	1,071
	4k random write	3	4	58,600	229	4.094	26.00%	1,127
	4k random write	3	8	58,600	229	8.170	26.00%	1,127
	4k random write	3	16	74,800	292	12.868	29.00%	1,290
	4k random write	3	32	74,900	293	25.659	29.00%	1,291
	64k random read	3	2	316,000	19,750	0.378	18.00%
	64k random read	3	4	488,000	30,500	0.490	26.00%
	64k random read	3	8	377,000	23,560	1.296	22.00%
	64k random read	3	16	601,000	37,562	1.596	27.00%
	64k random read	3	32	832,000	52,000	2.307	38.00%
	64k random write	3	1	14,700	919	4.078	13.00%
	64k random write	3	2	15,200	950	7.883	17.00%
	64k random write	3	4	14,400	900	16.656	17.00%
	64k random write	3	8	14,600	913	32.938	17.00%
	64k random write	3	16	14,700	919	65.238	18.00%
	64k random write	3	32	14,400	900	132.694	18.00%
	1024k read	1	1	19,900	19,900	1.002	5.00%
	1024k read	1	2	31,600	31,600	1.230	8.00%
	1024k read	1	4	42,400	42,400	1.882	11.00%
	1024k read	1	8	47,600	47,600	3.363	13.00%
	1024k read	1	16	50,100	50,100	6.379	16.00%
	1024k write	1	1	1,482	1,482	13.496	4.00%
	1024k write	1	2	1,573	1,573	25.448	4.00%
	1024k write	1	4	2,295	2,295	34.817	5.00%
	1024k write	1	8	2,187	2,187	73.178	5.00%

Overall, S2D shows exceptional performance in both test cases, however the storage capacity efficiency is about 35.7% and could be even less if additional space is assigned for in-place repairs.

Benchmarking results in graphs

With all benchmarks completed and data collected, we can now compare the results using graphical charts for a clearer understanding.

4k random read:

Figure 1: 4K RR (IOPS)

Let’s start with the 4K random read test, where Figure 1 showcases the performance in IOPS. The S2D in the Nested mirror-accelerated parity configuration with workload in the mirror tier reaches a remarkable 833,000 IOPS at 4 IO depth, scaling up to 2,653,000 IOPS at 128 IO depth.

Comparatively, the StarWind VSAN NVMe-oF HA scenario peaks at 881,000 IOPS at 128 IO depth. Here, S2D outshines StarWind VSAN with approximately 200% more IOPS at higher depths.

So, what’s the magic behind S2D’s performance boost? It’s all about local reading. In a cluster shared volume (CSV) setup, S2D leverages the SMB 3.0 protocol to allow multiple hosts to access and perform I/O operations on a shared volume (if you want to explore this topic in more detail, please read here or check this article). If a VM is running on the node that owns the volume, it can read data directly from the local disk, bypassing the network stack. This local read path minimizes latency and maximizes performance, leading to impressive IOPS numbers.

However, there’s a catch. This local reading perk only works if the VM is on the volume-owning node. If not, the read operations have to go through the network to the owning node, which can slow things down. To keep things running smoothly, you need to keep an eye on where your VMs are running and move them to the appropriate nodes as necessary.

Figure 2: 4K RR (Latency)

When it comes to random read latency, Figure 2 reveals that Storage Spaces Direct with the workload within the mirror tier also excels, starting at a low 0.286 ms at 4 IO depth and increasing to 2.897 ms at 128 IO depth. StarWind VSAN NVMe-oF begins at 0.570 ms, reaching 8.827 ms at the same depth.

Even with the workload split between the mirror and parity tiers, S2D maintains superior read latency, starting at 0.293 ms and peaking at 3.068 ms. The latency advantage of S2D is again attributed to local reads.

Figure 3: 4K RR (IOPS per 1% CPU Usage)

Switching gears to efficiency, Figure 3 compares IOPS per 1% CPU usage during a 4k random read test. Storage Spaces Direct with the workload in mirror tier proves highly efficient, delivering up to 21,704 IOPS per 1% CPU usage at 64 IO depth, whereas StarWind VSAN NVMe-oF peaks at 7,342 IOPS per 1% CPU usage at 128 IO depth. This makes S2D approximately 196% more efficient.

Even when the workload spans both S2D tiers, it maintains a strong efficiency advantage, reaching 19,909 IOPS per 1% CPU usage at 64 IO depth.

4k random read/write 70/30:

Figure 4: 4K RR/RW 70%/30% (IOPS)

Next, let’s dive into mixed 70/30 read-write patterns. Figure 4 is key for understanding real-world performance because pure read or write workloads are rare in actual production.

Figure 4 shows the number of IOPS during the mixed 70%/30% 4k random read/write tests with Numjobs = 3.

Interestingly, with Storage Spaces Direct, there’s a noticeable drop in performance at queue depths 4 and 8. This performance drop is not observed in StarWind VSAN tests. StarWind maintains consistent performance, hitting 334,000 IOPS at queue depth 4 and 301,200 IOPS at queue depth 8.

In contrast, with the workload in the mirror tier, S2D’s performance drops to 114,600 IOPS at queue depth 4 and 62,800 IOPS at queue depth 8, representing reductions of approximately 65.7% and 79.1%, respectively, compared to StarWind VSAN. Fortunately, S2D shows a significant rebound in performance starting at QD=16, ultimately scoring about 25% higher than StarWind VSAN under the same conditions.

However, when the workload is distributed across both mirror and parity tiers, S2D struggles due to ReFS continuously moving new data from the mirror tier to the parity tier, which negatively impacts performance. As a result, S2D records 126,200 IOPS at queue depth 2, drops to a low of 49,800 IOPS at QD=8, and then peaks at 247,000 IOPS at queue depth 32.

Meanwhile, StarWind VSAN outperforms S2D, achieving 241,500 IOPS at queue depth 2 and 534,000 IOPS at queue depth 32, highlighting its superior performance with IOPS figures that are 91.4% higher at queue depth 2 and 116.2% higher at queue depth 32 compared to S2D in the “dual-tier” scenario.

Figure 5: 4K RR/RW 70%/30% (Latency)

Figure 5 examines latency for the mixed 4K random 70/30 workload. Storage Spaces Direct with the workload in mirror tier starts at 0.382 ms at 2 IO depth, reaching 6.899 ms at 64 IO depth. StarWind VSAN NVMe-oF, on the other hand, starts at 0.582 ms and goes up to 7.768 ms.

Figure 6: 4K RR/RW 70%/30% (IOPS per 1% CPU Usage)

Figure 6 explores the number of IOPS relative to 1% CPU utilization during the same mixed workload.

Storage Spaces Direct with workload within the mirror tier provides up to 10,180 IOPS per 1% CPU usage at 16 IO depth, while StarWind VSAN NVMe-oF peaks at 5,394 IOPS per 1% CPU usage. This makes S2D about 89% more efficient.

When workload is touching both tiers, S2D achieves a maximum of 3,506 IOPS per 1% CPU usage, demonstrating 69.7% less performance compared to the StarWind VSAN NVMe-oF HA scenario at 64 IO depth.

4k random write:

Figure 7: 4K RW (IOPS)

The ability to maintain consistent write performance across various queue depths is crucial for demanding virtualization environments. Figure 7 shows the amount of IOPS during 4k random write operations.

StarWind VSAN stands out with consistent performance across most queue depths, significantly outperforming S2D (workload in mirror tier) from IO depth 2 to 16.

With S2D and workload in the mirror tier, there’s a big drop in performance at queue depths 4 and 8. At queue depth 4, Storage Spaces Direct score about 46,800 IOPS which is more than 4 times lower than StarWind VSAN’s 192,000 IOPS figure. At queue depth 8, the gap widens even more and StarWind VSAN ends up being 584% more effective. This result is unexpected, as we anticipated better performance from S2D in this test compared to when the workload spans both mirror and parity tiers.

Interestingly, at queue depth 32 StarWind VSAN loses the advantage scoring 167,000 IOPS, while S2D with data in mirror tier gets traction achieving 186,000 IOPS. That’s being said, when workload is hitting both tiers, S2D is unable to show better performance figures and ends up scoring 74,900 IOPS.

Figure 8: 4K RW (Latency)

Latency for 4K random writes is also a critical factor. Since latency corresponds to prior IOPS results, the overall picture remains consistent in Figure 8.

StarWind VSAN NVMe-oF demonstrates the lowest latency, starting at 0.859 ms with a 2 IO depth and increasing to 3.689 ms at a 16 IO depth. Virtual SAN significantly outperforms Storage Spaces Direct with workload in mirror tier, which starts at 1.499 ms (43% higher latency) and rises to 14.876 ms (75% higher latency) in QD=16 test.

When comparing StarWind to S2D with workload within both tiers, the performance gap is even more pronounced, with StarWind showing 63% lower latency at 2 IO depth (0.859 ms vs. 2.324 ms) and 55% lower latency at 32 IO depth (11.476 ms vs. 25.659 ms). Only at 32 IO depth, StarWind VSAN NVMe-oF demonstrates slightly higher latency, reaching 11.476 ms compared to S2D’s 10.174 ms, which is about 11% lower.

Figure 9: 4K RW (IOPS per 1% CPU Usage)

Efficiency in 4K random write workloads is measured in IOPS per 1% CPU usage, as shown in Figure 9.

Storage Spaces Direct with workload in mirror tier achieves up to 5,167 IOPS per 1% CPU usage, while StarWind VSAN NVMe-oF peaks at 3,093 IOPS per 1% CPU usage, making S2D approximately 67% more efficient. However, when the workload utilizes both S2D tiers, efficiency drops significantly, with a maximum of only 1,291 IOPS per 1% CPU usage, making it the least efficient of the three scenarios.

64k random read:

Figure 10: 64K RR (Throughput)

Moving to larger data blocks, Figure 10 illustrates the throughput performance for 64K random reads.

Storage Spaces Direct with workload in mirror tier significantly outpaces the StarWind VSAN NVMe-oF HA scenario, achieving a peak of 53,500 MiB/s at 32 IO depth compared to StarWind’s 14,562 MiB/s. This indicates that S2D delivers approximately 267% more throughput.

When the workload utilizes both S2D tiers, it shows slightly lower throughput but still surpasses StarWind VSAN significantly. The higher performance in S2D is attributed to local reads, but remember, this efficiency is conditional on the VM running on the node that owns the volume. StarWind VSAN, in contrast, provides stable performance regardless of VM placement, eliminating the need for additional monitoring and VM binding.

Figure 11: 64K RR (Latency)

Figure 11 shows the latency for 64K random reads. The results align with the throughput data discussed earlier.

Here, S2D with workload in the mirror tier maintains low latency due to local reads, starting at 0.376 ms and reaching 2.243 ms at 32 IO depth. The StarWind VSAN NVMe-oF scenario starts higher at 0.749 ms and peaks at 8.343 ms, which is up to 73% higher latency than S2D.

Figure 12: 64K RR (CPU Usage)

In Figure 12, we examine CPU usage during 64K random reads.

Storage Spaces Ditrect with workload in the mirror tier starts at 17% CPU usage at 2 IO depth and peaks at 38% at 32 IO depth. When the workload hits both tiers, S2D shows consistent CPU usage trends, closely following the S2D “mirror-only” test results.

The StarWind VSAN NVMe-oF scenario begins significantly higher at 35% and peaks at 41%, slightly above S2D. This indicates that S2D is more efficient in CPU usage, with StarWind VSAN using approximately 106% more CPU at IO depths 2 to 16 and about 8% more at IOdepth=32.

64k random write:

Figure 13: 64K RW (Throughput)

Figure 13 illustrates the 64K random write throughput, highlighting performance differences across three scenarios.

Storage Spaces Direct with workload in the mirror tier exhibits erratic performance, with notable drops at medium IO depths. For example, throughput falls to 2,019 MiB/s at a 4 IO depth, dips further to 1,457 MiB/s at 8 IO depth, and then rebounds to 2,616 MiB/s at 16 IO depth. This pattern mirrors the behavior observed in 4K random write tests.

In contrast, StarWind VSAN delivers more consistent performance, surpassing S2D by 79.6% at a 4 IO depth, by 167.6% at 8 IO depth, and by 52% at 16 IO depth.

When workloads span both tiers, S2D shows significantly lower throughput across the board. StarWind VSAN outperforms S2D by 199% at a 1 IO depth, with the performance gap widening to 343% at a 32 IO depth.

This highlights StarWind’s capability to handle write operations with consistently high performance across varying IO depths.

Figure 14: 64K RW (Latency)

Figure 14 displays the latency for 64K random writes, showing a similar trend.

StarWind VSAN delivers faster response times at lower IO depths (4, 8, and 16), but S2D (with workloads in the mirror tier) takes the lead at a 32 IO depth, achieving a lower latency of 22.138 ms compared to StarWind’s 30.150 ms.

When compared to S2D’s configuration with workloads spread across both tiers, StarWind VSAN is significantly more efficient, providing 66.9% faster response times at a 1 IO depth and 77.3% lower latency at a 32 IO depth.

Figure 15: 64K RW (CPU usage)

Figure 15 highlights CPU usage during 64K random writes.

StarWind VSAN consistently shows higher CPU utilization compared to both Storage Spaces Direct configurations. At a 1 IO depth, StarWind uses 25% CPU, which is 79% higher than S2D’s 14% with the workload in the mirror tier and 92% higher than S2D’s 13% in the mixed-tier test. This trend persists across different IO depths, with StarWind maintaining higher CPU usage but delivering more consistent performance under varying workloads.

1M read:

Figure 16: 1024K R (Throughput)

Figure 16 presents the throughput results for 1024K reads, where S2D with workload in the mirror tier significantly outperforms StarWind VSAN, reaching 52,000 MiB/s at a 16 IO depth compared to StarWind’s 15,600 MiB/s — about 233% higher throughput.

Even when workloads are spread across both tiers, S2D continues to outperform StarWind VSAN by a substantial margin. This impressive read performance from S2D is again due to local reads when the test VM is located on the CSV owner node.

Figure 17: 1024K R (Latency)

Figure 17 shows the latency results during the 1024K read test, reflecting a pattern similar to the throughput results.

S2D with workloads in the mirror tier demonstrates impressively low latency, benefiting from local reads. Latency starts at 1.002 ms and increases to 6.098 ms as IO depth grows.

In contrast, StarWind VSAN starts at 1.998 ms and peaks at 20.625 ms, resulting in S2D delivering up to 240% lower latency than StarWind.

When workloads are distributed across both S2D tiers, latency remains nearly identical to that of the mirror tier tests.

Figure 18: 1024K R (CPU Usage)

Figure 18 highlights CPU usage during 1024K reads, where S2D demonstrates significantly lower resource consumption compared to StarWind VSAN.

With workloads in the mirror tier, S2D starts at 5% CPU usage at a 1 IO depth and increases to 16% at a 16 IO depth.

In contrast, StarWind VSAN begins at 26% and rises to 33%, meaning S2D uses about 63% less CPU on average.

Even when workloads span both tiers, S2D maintains the same CPU usage levels as in the mirror-only benchmarks. S2D’s efficiency in local reads translates to more effective CPU usage, while StarWind requires more resources to sustain consistent performance.

1M write:

Figure 19: 1024K W (Throughput)

When we shift our focus to 1024K sequential write throughput, Figure 19 reveals that Storage Spaces Direct (S2D) with a workload in the mirror tier holds a significant performance advantage over StarWind VSAN. Specifically, S2D reaches 8,693 MiB/s at a 2 IO depth, while StarWind VSAN manages 3,903 MiB/s. At an 8 IO depth, S2D continues to dominate, hitting 8,559 MiB/s, compared to StarWind’s 4,156 MiB/s. This means S2D delivers approximately 122% higher throughput—if your workload is optimized for the mirror tier.

However, it’s important to note that this advantage is conditional. If the workload doesn’t fit into the mirror tier, S2D’s performance drops dramatically. This is evident in the multi-tiered test results, where StarWind VSAN outperforms Storage Spaces Direct by about 106% on average.

Figure 20: 1024K W (Latency)

Diving into the 1024K write latency as depicted in Figure 20, we see a consistent theme.

With its workload in the mirror tier, Storage Spaces Direct begins at a brisk 2.400 ms and climbs to 18.684 ms at 8 IO depth. In comparison, StarWind VSAN starts at a slower 5.804 ms and escalates to a higher peak of 38.492 ms, which demonstrates that S2D provides up to 106% lower latency.

However, when the workload spans both tiers, the scenario shifts. S2D records the highest latencies, starting at 13.496 ms and surging to 73.178 ms at 8 IO depth. This again indicates a significant performance shift depending on how well the workload is aligned with S2D’s optimal tier configuration.

Figure 21: 1024K W (CPU Usage)

Figure 21 highlights CPU usage during 1024K writes. When running the workload in the mirror tier, Storage Spaces Direct (S2D) starts with 8% CPU usage at 1 IO depth and consistently holds at 9% across 2, 4, and 8 IO depths.

In contrast, StarWind VSAN begins at a much higher 24% and remains steady around 25% across all IO depths. This indicates that S2D consumes approximately 72% less CPU on average, demonstrating significantly more efficient resource utilization compared to StarWind VSAN.

When the workload spans both tiers of S2D, it continues to exhibit even lower CPU usage, starting at just 4% at 1 and 2 IO depths and modestly rising to 5% at 4 and 8 IO depths.

Additional benchmarking: 1 VM, 1 numjobs, 1 iodepth.

To gain a deeper understanding of how StarWind VSAN and Storage Spaces Direct perform under specific synthetic conditions, we conducted additional benchmarks focusing on a single VM scenario, with numjobs = 1 and an IO depth of 1. Typically, this is the best way to measure storage access latency in the best possible scenario.

Benchmark results in a table

StarWind VSAN NVMe-oF HA (TCP) – Host mirroring + MDRAID5 (1 VM)
Pattern	Numjobs		IOdepth		IOPs		MiB\s		Latency (ms)
4k random read	1		1		1,112		4		0.897
4k random write	1		1		501		2		1.991
4k random write (synchronous)	1		1		226		1		4.415
Storage Spaces Direct (TCP) – Nested mirror accelerated parity – Data in mirror tier (1 VM)
Pattern		Numjobs		IOdepth		IOPs		MiB\s		Latency (ms)
4k random read		1		1		7,221		28		0.137
4k random write		1		1		5,456		21		0.182
4k random write (synchronous)		1		1		2,887		11		0.344

Storage Spaces Direct (TCP) – Nested mirror accelerated parity – Data in mirror and parity tiers (1 VM)
Pattern	Numjobs	IOdepth	IOPs	MiB\s	Latency (ms)
4k random read	1	1	5,920	23	0.167
4k random write	1	1	2,517	10	0.395
4k random write (synchronous)	1	1	1,772	7	0.562

Benchmark results in graphs

This section presents visual comparisons of the performance and latency metrics across storage configurations under research.

4k random read:

Figure 1: 4K RR (IOPS)

Figure 1 demonstrates IOPS for the 4K random read test at 1 IO depth and with one numjobs. S2D with the workload in mirror tier outperforms the StarWind VSAN NVMe-oF HA scenario, delivering 7,221 IOPS.

This remarkable 550% increase over StarWind’s 1,112 IOPS is primarily due to S2D’s ability to leverage local reads and operate at the host level. In contrast, StarWind VSAN, running inside a VM, encounters a much longer datapath, which negatively impacts its performance.

Even when S2D operates with data across both mirror and parity tiers, it maintains strong performance at 5,920 IOPS, still surpassing StarWind by 432%.

Figure 2: 4K RR (Latency)

Latency metrics for the 4K random read test at 1 IO depth, as shown in Figure 2, similarly favor Storage Spaces Direct with the workload in mirror tier, which records a swift 0.137 ms. This is a substantial 553% lower latency compared to StarWind VSAN’s 0.897 ms. This advantage is again due to S2D’s local read capabilities and direct host-level operation. Even in a mixed-tier setup, S2D maintains its lead with a latency of 0.167 ms, still outperforming StarWind by 437%.

Figure 3: 4K RW (IOPS)

Figure 3 showcases the results of the 4K random write test at IO depth=1 with a numjob=1. Storage Spaces Direct with the workload in the mirror tier achieves a remarkable 5,456 IOPS — an astounding 990% higher than StarWind VSAN’s 501 IOPS. This significant advantage stems from S2D’s ability to write directly to the mirror tier, bypassing the resource-intensive parity calculation.

However, when S2D handles workloads across both mirror and parity tiers, performance drops to 2,517 IOPS due to the additional overhead of invalidating data in the parity tier. For a deeper dive into how reading and writing function in a Nested mirror-accelerated parity scenario, please refer to the detailed explanation provided here.

On the other hand, StarWind VSAN, which writes directly to the MDRAID5 array, experiences performance degradation due to the read-modify-write (RMW) operations and the extended IO datapath inherent in its VM-based operation. Despite these technical challenges, the performance of StarWind VSAN at QD=1 appears unusually low, prompting us to initiate an investigation into this issue to uncover the underlying cause.

Figure 4: 4K RW (Latency)

Moving on to Figure 4, we examine the latency metrics for 4K random writes.

No surprises here. Storage Spaces Direct (S2D) continues to deliver superior performance, benefiting from its efficient data handling within the mirror tier and achieving an impressively low latency of 0.182 ms. This represents a staggering 995% improvement over StarWind VSAN’s 1.991 ms.

Even when S2D operates across both mirror and parity tiers, it maintains a competitive latency of 0.395 ms, still outperforming StarWind by 404%.

Figure 5: 4K RW Synchronous (IOPS)

In our synchronous 4K RW single-threaded IO tests, as shown in Figure 5, Storage Spaces Direct with the dataset in the mirror tier once again takes the lead, achieving 2,887 IOPS — a staggering 1,177% increase over StarWind VSAN’s 226 IOPS.

This significant performance boost is attributed to the same factors observed in asynchronous 4K random write test, where S2D benefits from direct writes to the mirror tier, effectively bypassing the resource-heavy parity calculations.

Even in the “mixed tiers” setup, S2D maintains a strong advantage, delivering 1,772 IOPS and still outpacing StarWind by 684%.

Figure 6: 4K RW Synchronous (Latency)

Figure 6 highlights the latency results for synchronous 4K RW single-threaded IO, further confirming S2D’s performance edge.

With the workload within the mirror tear, Storage Spaces Direct achieves the write latency of 0.344 ms — an impressive 1,184% lower than StarWind VSAN’s 4.415 ms.

Even when using the mirror and parity tiers, S2D maintains a strong latency advantage at 0.562 ms, outpacing StarWind by 686%. This superior performance stems from S2D’s efficient data handling, consistently delivering lower latency across varying configurations.

Conclusion

To sum it up, both Storage Spaces Direct and StarWind VSAN come with their own set of perks and trade-offs for your IT infrastructure.

Storage Spaces Direct shines in read performance, particularly when virtual machines are aligned with CSV owner nodes. However, we observed some unexpected performance issues during 4K and 64K random-write tests, where S2D sometimes underperformed with data in the mirror tier compared to when it spanned both mirror and parity tiers. This highlights the need for careful monitoring to ensure optimal performance. Mismanagement of workloads can lead to significant performance drops, particularly at certain queue depths. Additionally, S2D requires extra space for fault tolerance, which can impact overall capacity efficiency.

On the other hand, StarWind VSAN proves to be a solid choice for high-performance environments, especially with mixed read/write or write-heavy workloads. It consistently delivers superior write performance under load, regardless of VM placement, and offers better capacity efficiency. However, StarWind VSAN lacks the local read boost that S2D provides, can be more demanding on CPU resources, and showed some anomalies in single-threaded tests.

So, if you’re looking for exceptional read performance and don’t mind keeping a close eye on your workloads, S2D is a great option. But if you want a consistent write and mixed IO performance with better capacity efficiency, StarWind VSAN in NVMe-oF HA configuration is the way to go.

Stay tuned for our upcoming articles, where we’ll dive deeper into these solutions to give you a bigger picture of how they can fit into your IT strategy.

StarWind Virtual SAN (VSAN) vs Microsoft Storage Spaces Direct (S2D), Part 1: Hyper-V HCI Performance Benchmarking (TCP)

Introduction

Solutions overview

StarWind VSAN for Hyper-V NVMe-oF over TCP scenario:

Microsoft Storage Spaces Direct over TCP scenario – Nested mirror-accelerated parity:

Capacity efficiency

Testbed overview

Testing methodology

Test Scenarios:

Data Patterns Tested:

Pre-Test Warm-Up:

Test Execution:

Specific Configurations:

Benchmarking local NVMe performance

Benchmark results in a table

StarWind VSAN for Hyper-V NVMe-oF over TCP scenario

Microsoft Storage Spaces Direct over TCP scenario (Nested mirror-accelerated parity: Mirror-only)

Microsoft Storage Spaces Direct over TCP scenario (Nested mirror-accelerated parity: Both Tiers)

Benchmarking results in graphs

4k random read:

4k random read/write 70/30:

4k random write:

64k random read:

64k random write:

1M read:

1M write:

Additional benchmarking: 1 VM, 1 numjobs, 1 iodepth.

Benchmark results in a table

Benchmark results in graphs

4k random read:

Conclusion