Search
StarWind is a hyperconverged (HCI) vendor with focus on Enterprise ROBO, SMB & Edge

Linux NVMe-oF Initiator and StarWind NVMe-oF Initiator for Windows: Performance Comparison Part 2.1

  • June 13, 2023
  • 14 min read
StarWind DevOps Team Lead. Volodymyr possesses broad expertise in virtualization, storage, and networking, with exceptional experience in architecture planning, storage protocols, hardware sourcing, and research.
StarWind DevOps Team Lead. Volodymyr possesses broad expertise in virtualization, storage, and networking, with exceptional experience in architecture planning, storage protocols, hardware sourcing, and research.

Introduction

As you already know, NVMe-oF (Non-Volatile Memory Express over Fabrics) is a groundbreaking technology that extends the advantages of NVMe across a network. In short, NVMe-oF allows direct access to storage media over high-speed networks, capitalizing on the speedy and large-capacity architecture of NVMe.

What does it mean? NVMe-oF significantly ramps up data transfer speeds, cuts down latency, and enhances overall storage efficiency. Basically, this technology takes storage performance to new levels and potentially changes the approach to building storage solutions in its entirety.

Purpose

In our previous material, we already delved into comparing the storage performance of Linux NVMe-oF initiator for Linux and StarWind NVMe-oF Initiator for Windows over RDMA and TCP.
That setup is as close to an actual business environment as it gets. Results have shown that the performance of each initiator was on par with the other, whether you use RDMA or TCP.

Now, we’re going to try it again with the same setup but different NVMe-oF implementations. Our purpose here is to compare the performance of the Linux NVMe-oF Initiator with the StarWind NVMe-oF Initiator for Windows (versions 1.9.0.567 and 1.9.0.578) over RDMA and TCP. Let’s see how they fare this time!

Benchmarking Methodology, Details & Results

The benchmark was conducted with the help of the fio utility in Linux NVMe-oF Initiator scenarios and diskspd in StarWind NVMe-oF Initiator for Windows scenarios.

Testbed:

NVMe-oF

Hardware:

nvmeof-target Supermicro (SYS-220U-TNR)
CPU Intel(R) Xeon(R) Platinum 8352Y @2.2GHz
Sockets 2
Cores/Threads 64/128
RAM 256Gb
Storage 1x Intel® Optane™ SSD DC P5800X Series (400GB)
NICs 1x ConnectX®-5 EN 100GbE (MCX516A-CDAT)

Software:

OS Ubuntu 20.04.5 (5.15.0-67-generic)
SPDK v22.09
fio 3.16

Hardware:

linux-cli

windows-cli

Supermicro (2029UZ-TR4+)
CPU 2x Intel® Xeon® Platinum 8268 Processor @ 2.90GHz
Sockets 2
Cores/Threads 48/96
RAM 96Gb
NIC 1x ConnectX®-5 EN 100GbE (MCX516A-CDAT)

Software linux-cli:

OS Ubuntu 20.04.5 (5.4.0-139-generic)
fio 3.16
nvme-cli 1.9

Software windows-cli:

OS Windows Server 2019 Standard Edition (Version 1809)
diskspd 2.1.0-dev
StarWind NVMeoF Initiator 1.9.0.567
1.9.0.578

Test Patterns:

  • random read 4k;
  • random write 4k;
  • random read 64K;
  • random write 64K;
  • sequential read 1M;
  • sequential write 1M.

Linux – fio parameter examples

#Parameters “numjobs” and “iodepth” can change.

 Windows – diskspd parameter examples:

#Parameters “-t (threads)” and “-o (number of outstanding I/O requests)” can change.

Performance Comparison: Linux and Windows Storage Stack:

In order to have a clearer understanding of this setup, it would be best if we compare the local performance of 1 NVMe for Linux and Windows.

Linux local 1x Intel® Optane™ SSD DC P5800X Series (400GB) Windows local 1x Intel® Optane™ SSD DC P5800X Series (400GB) Comparison
pattern numjobs iodepth IOPs MiB\s latency (ms) CPU usage numjobs iodepth IOPs MiB\s latency (ms) CPU usage IOPs MiB\s latency (ms) CPU usage
random read 4k 6 4 1552000 6 061 0.015 4% 12 4 1508000 5 890 0.030 17% 97% 97% 200% 425%
random write 4k 6 4 1160000 4 532 0.020 3% 10 4 1113000 4 349 0.030 14% 96% 96% 150% 467%
random read 64K 2 2 109 000 6 799 0.036 1% 3 2 111 000 6 919 0.050 2% 102% 102% 139% 200%
random write 64K 2 2 75 500 4 717 0.052 1% 2 2 73 000 4 565 0.050 1% 97% 97% 96% 100%
read 1M 1 2 7 034 7 034 0.284 1% 1 2 7 030 7 030 0.280 1% 100% 100% 99% 100%
write 1M 1 2 4 790 4 790 0.417 1% 1 2 4 767 4 767 0.420 1% 100% 100% 101% 100%

As you can see for yourself, based on random read/write 4k patterns, to achieve comparable performance in Windows, you need to double the ‘numjobs’ parameter. This will result in a latency increase of 100-150% and a CPU usage increase of 425-470%.

Similar results are seen in the random read 64K pattern. The performance is virtually the same for both operating systems in the random write 64K, read 1M, and write 1M patterns. In other words, the difference in the results acquired during the benchmarking of the two NVMe-oF initiators described below is no mystery when examining the difference in Linux and Windows storage stack performance outlined above.

Benchmarking Results:

Performance Comparison: Linux NVMe-oF Initiator and StarWind NVMe-oF Initiator for Windows (v1.9.0.567 and v1.9.0.578) over RDMA:

Remote (RDMA)
Linux NVMe-oF Initiator
1x Intel® Optane™ SSD DC P5800X Series (400GB)
Remote (RDMA)
Windows StarWind NVMe-oF Initiator (1.9.0.567)
1x Intel® Optane™ SSD DC P5800X Series (400GB)
Comparison
pattern numjobs iodepth IOPs MiB\s latency (ms) CPU usage numjobs iodepth IOPs MiB\s latency (ms) CPU usage IOPs MiB\s latency (ms) CPU usage
random read 4k 10 4 1500000 5 861 0.026 12% 24 4 1132414 4 423 0.073 31% 75% 75% 281% 258%
random write 4k 8 4 1130000 4 413 0.027 9% 24 4 1105729 4 319 0.087 31% 98% 98% 322% 344%
random read 64K 3 2 110 000 6 850 0.054 1% 4 2 110 328 6 895 0.072 2% 100% 101% 133% 200%
random write 64K 3 2 75 400 4 712 0.079 1% 3 2 74 819 4 676 0.080 1% 99% 99% 101% 100%
read 1M 1 2 7 033 7 033 0.283 1% 1 2 7 030 7 030 0.284 1% 100% 100% 100% 100%
write 1M 1 2 4 788 4 788 0.416 1% 1 2 4 787 4 787 0.418 1% 100% 100% 100% 100%

The optimizations in StarWind NVMe-oF Initiator for Windows v1.9.0.578 have played a vital role. Thanks to them, we managed to gain an 11% performance increase in the random read 4k pattern and 4% in the random write 4k pattern (compared to v1.9.0.567). As a result, the thread count (‘numjobs’) has been reduced from 24 in v1.9.0.567 to 16 or 18 in v1.9.0.578, leading to decreased latency and CPU usage values.

When compared with Linux NVMe-oF Initiator, StarWind NVMe-oF Initiator for Windows only shows a difference in the random read 4k pattern (14% lower performance). The results are largely comparable in all other patterns. Higher latency and CPU usage values are explained in the section “Performance Comparison: Linux and Windows Storage Stack.”

Performance Comparison: Linux NVMe-oF Initiator and StarWind NVMe-oF Initiator for Windows (v1.9.0.567 and v1.9.0.578) over TCP:

Remote (TCP)
Linux NVMe-oF Initiator
1x Intel® Optane™ SSD DC P5800X Series (400GB)
Remote (TCP)
Windows StarWind NVMe-oF Initiator (1.9.0.567)
1x Intel® Optane™ SSD DC P5800X Series (400GB)
Comparison
pattern numjobs iodepth IOPs MiB\s latency (ms) CPU usage numjobs iodepth IOPs MiB\s latency (ms) CPU usage IOPs MiB\s latency (ms) CPU usage
random read 4k 16 16 1536000 6 000 0.165 18% 30 16 1080908 4 222 0.441 23% 70% 70% 267% 128%
random write 4k 12 16 1123000 4 387 0.170 12% 20 16 1046539 4 088 0.303 22% 93% 93% 178% 183%
random read 64K 6 4 110 000 6 867 0.217 4% 7 4 105 154 6 572 0.266 5% 96% 96% 123% 125%
random write 64K 6 4 75 300 4 705 0.318 3% 6 4 74 937 4 683 0.320 3% 100% 100% 101% 100%
read 1M 3 4 7 000 7 000 1.711 3% 3 4 7 032 7 032 1.706 3% 100% 100% 100% 100%
write 1M 3 2 4 435 4 435 1.351 1% 2 2 4 786 4 786 1.253 1% 108% 108% 93% 100%

 

Remote (TCP)
Linux NVMe-oF Initiator
1x Intel® Optane™ SSD DC P5800X Series (400GB)
Remote (TCP)
Windows StarWind NVMe-oF Initiator (1.9.0.578)
1x Intel® Optane™ SSD DC P5800X Series (400GB)
Comparison
pattern numjobs iodepth IOPs MiB\s latency (ms) CPU usage numjobs iodepth IOPs MiB\s latency (ms) CPU usage IOPs MiB\s latency (ms) CPU usage
random read 4k 16 16 1536000 6 000 0.165 18% 32 16 1444087 5 640 0.352 29% 94% 94% 213% 161%
random write 4k 12 16 1123000 4 387 0.170 12% 22 16 1176025 4 593 0.298 25% 105% 105% 175% 208%
random read 64K 6 4 110 000 6 867 0.217 4% 7 4 109 877 6 867 0.254 5% 100% 100% 117% 125%
random write 64K 6 4 75 300 4 705 0.318 3% 6 4 76 096 4 756 0.315 3% 101% 101% 99% 100%
read 1M 3 4 7 000 7 000 1.711 3% 3 4 7 028 7 028 1.706 3% 100% 100% 100% 100%
write 1M 3 2 4 435 4 435 1.351 1% 2 2 4 698 4 698 0.851 1% 106% 106% 63% 100%

Conclusion

Thanks to the adjustments, StarWind NVMe-oF Initiator for Windows v1.9.0.578 could increase its performance by 24% in the random read 4k pattern, by 12% in the random write 4k pattern, and by 4% in the random read 64K pattern. Not bad at all! Compared with Linux NVMe-oF Initiator, StarWind NVMe-oF Initiator for Windows v1.9.0.578 only gives way in the random read 4k pattern (6%). As you can already tell, the performance is comparable or higher in all other patterns. I hope this information proves useful and assists you in the future!

This material has been prepared in collaboration with Viktor Kushnir, Technical Writer with almost 4 years of experience at StarWind.

Hey! Found Volodymyr’s insights useful? Looking for a cost-effective, high-performance, and easy-to-use hyperconverged platform?
Taras Shved
Taras Shved StarWind HCI Appliance Product Manager
Look no further! StarWind HCI Appliance (HCA) is a plug-and-play solution that combines compute, storage, networking, and virtualization software into a single easy-to-use hyperconverged platform. It's designed to significantly trim your IT costs and save valuable time. Interested in learning more? Book your StarWind HCA demo now to see it in action!