Hello, fellow IT and virtualization gurus! Today, we’re diving into the world of fast storage. NVMe storage is becoming increasingly popular and in demand in our rapidly evolving industry. However, with the rise of NVMe storage, the IT world faced the challenge of fully leveraging its capabilities. Traditional storage protocols like iSCSI couldn’t unlock the full potential of NVMe storage. To address this, the NVMe-oF (NVMe over Fabrics) storage protocol was created, enabling us to harness NVMe’s capabilities to the fullest.
NVMe-oF is now leading the way in providing low-latency, ultra-performance storage, completely surpassing the ‘old school’ iSCSI. StarWind is at the forefront of NVMe adoption and is currently the only vendor offering a production-ready NVMe Initiator for Windows.
So, let’s dive into the details and explore what makes NVMe-oF a game-changer and how StarWind is leading the charge in this exciting space.
NVMe-oF vs iSCSI
We’re all familiar with iSCSI—a proven, reliable, and stable storage protocol that has served us well for decades. We appreciate iSCSI for its balance of performance and simplicity; it’s like an old sports car—good-looking, reasonably fast, and fun to drive. However, just like an old sports car, iSCSI has its limitations. You can’t tune it for much more performance compared to modern vehicles (and it doesn’t have Apple CarPlay either!). iSCSI is built on an older transport protocol, which shows its age, especially when paired with cutting-edge hardware like PCIe 4 and 5 NVMe drives, modern CPUs, and network cards. The entire iSCSI software stack, including OS drivers, initiators, and targets, wasn’t designed to handle the super-high I/O command parallelism that modern hardware supports.
NVMe-oF, on the other hand, is a modern protocol designed with today’s hardware in mind, free from the performance limitations of iSCSI. It offers ultra-low latency, exceptional CPU efficiency, and immense performance—just a few of the benefits when using NVMe storage with the NVMe-oF protocol. However, unlike iSCSI, NVMe-oF can be more complex to configure properly. Setting up a stable, reliable, and highly available storage configuration without performance loss can be challenging.
You might already be thinking about the hurdles you’ll face when implementing this in your production environment. Don’t worry — StarWind has you covered with its StarWind VSAN software, providing a solution that simplifies the complexity of high-availability NVMe-oF deployment.
NVMe-oF over TCP vs NVMe-oF over RDMA
One of the key advantages of the NVMe-oF protocol is its versatility, offering two configuration options depending on your network equipment: NVMe-oF over TCP and NVMe-oF over RDMA. This flexibility significantly broadens the range of environments where the protocol can be successfully implemented, ensuring that more companies can benefit from its performance enhancements. Let’s explore how these options can be leveraged with StarWind VSAN software:
Option 1: NVMe-oF over TCP
The first option is NVMe-oF over TCP. This configuration is faster than iSCSI and uses the standard TCP protocol as the transport protocol, allowing the use of commodity hardware. This means you can seamlessly replace iSCSI with NVMe-oF without reconfiguring your servers or replacing any components. You’ll largely benefit from lower latency and reduced CPU utilization on your all-flash storage even if it’s not NVM-e.
In our simplified (and not fully redundant) 2-node hyperconverged NVMe-oF over TCP configuration, we use one dedicated port for the replication network and another for NVMe-oF data traffic. The data traffic port and the management network handle the heartbeat, while the data traffic port also functions as a listener.
As you can see, it is a very straightforward configuration. However, you can add extra “NVMe-oF Partner” and “Replication” connections to achieve a fully redundant configuration.
Option 2: NVMe-oF over RDMA
NVMe-oF over RDMA configuration allows you to maximize your storage performance, offering even lower latency than NVMe-oF over TCP. This results in ultra-fast storage capable of handling any type of workload. However, this configuration requires RDMA-capable network adapters.
In this simplified RDMA with SR-IOV configuration, we use one network port for replication traffic and another dedicated port for local NVMe-oF data network. The management network handles the heartbeat. Both ports on each node (for replication and local network) are used as listeners.
In both configurations, complete redundancy requires adding additional ‘Replication’ and ‘NVMe-oF Data’ connections.
In this article, I will start by showcasing the first configuration: StarWind VSAN with NVMe-oF over TCP. The RDMA-enabled setup will be covered in my next articles.
StarWind VSAN NVMe-of storage device configuration
StarWind VSAN supports a variety of hypervisors, so you can find one that meets your requirements. Due to the increased popularity following the Broadcom-VMware deal, today, I will show you how to configure StarWind NVMe-oF HA on Proxmox VE.
Assumptions and considerations
StarWind VSAN is deployed as the Controller Virtual Machine (CVM). The deployment steps for the StarWind CVM, CVM network configuration, and storage addition options can be found in this Configuration Guide. The following steps assume that you have successfully deployed StarWind VSAN CVM on both cluster nodes and completed the Initial Configuration Wizard.
Configuring HA storage
The steps for configuring highly available NVMe-oF storage for Proxmox VE are as follows:
Ensure all the disks are correctly added to each CVM and are ready to be used.
Log into each StarWind VSAN CVM Web-UI and navigate to the “Physical Disks” tab. You should see your NVMe disks listed there:
3. Now, to create a storage pool, navigate to the “Storage Pool” tab and open the storage configuration wizard:
4. Read the guidelines and press “Next”:
5. Next, select the CVM you are willing to create a storage pool on and press “Next”:
6. Now, select the drives to add into the storage pool:
7. Select the RAID type for configuration and press “Next”. You will have two automatic configuration presets and one manual option:
‘High Capacity’ represents a recommended RAID 5 configuration.
‘High Performance’ represents a recommended RAID 1/RAID 10 configuration, depending on the number of selected drives.
‘Manual’ allows you to configure the desired type of storage pool with all the parameters.
8. On the “Summary” step, check the specified parameters and press “Create” to start initializing the storage pool:
QUICK NOTE: You can select both CVMs simultaneously to complete the configuration faster or configure each side separately. I’ve chosen the latter as it’s more convenient for me.
After creating the storage pool, it is time to create the local volumes that will be used for our HA storage.
9. Navigate to the “Volumes” tab and open the configuration wizard:
10. In the Create volume wizard, select both CVMs and press “Next”:
QUICK NOTE: Like with the storage pools, you can select partnered CVMs simultaneously to complete the configuration faster or configure each side separately.
11. Assign the name for both volumes, specify their capacity, and press “Next”:
12. Select the optimal filesystem settings for your volume. For proper NVMe-oF configuration, we don’t need a filesystem, so choose ‘Raw’ and press “Next”.
13. The final step is to check that we picked the right configuration parameters and press “Create”:
14. The newly created volumes will be shown as ‘Unclaimed.’ This is expected, so don’t worry about that status.
15. The final part of the NVMe-oF HA configuration is creating an HA LUN. Navigate to the “LUNs” tab and open the “Create LUN” wizard:
16. First, we need to select the protocol. In our case, it is obviously NVMe-oF:
17. Choose the LUN availability option. In our case, we want to protect the data, so select ‘High availability’:
18. Then, select both CVMs to configure the replication between them:
19. Since we only have a single volume on each machine, we don’t need to select anything else. Press “Next”:
20. Choose the failover strategy. In this version of StarWind VSAN, only one option is available for NVMe-oF setups: ‘Heartbeat.’ Additional options will be added in future updates. Press ‘Next.’:
21. Specify the LUN name and select the networks to be used for discovery and further connectivity with NVMe-oF HA storage. Choose the adapters previously configured as the ‘Data’ networks.:
22. Now, double-check the summary of configuration parameters and press “Create LUN” button:
Great! At this point, we are done with the StarWind VSAN portion. Now, we need to discover and connect our NVMe-oF storage device to Proxmox VE.
We start by installing the nvme-cli package and configuring the nvme_tcp kernel module to be injected at boot on each Proxmox host. Proxmox is not shipped with those packages by default, and NVMe-oF functionality is also not included in the Proxmox VE Web UI. So, we need to do everything from the shell terminal.
The packages installation and kernel module configuration commands are the following:
Packages
apt update && apt -y install nvme-cli
NVMe kernel module
modprobe nvme_tcp && echo “nvme_tcp” > /etc/modules-load.d/nvme_tcp.conf
Now, we need to install multipathing tools by running the following command:
apt-get install multipath-tools
The next step is to create the configuration file for the multipathing tools:
touch /etc/multipath.conf
Now, edit the configuration file with any editor tool like “nano”:
nano /etc/multipath.conf
QUICK NOTE: “nano” editor was used as the command example.
Add the following content to the “multipath.conf” file:
devices{
device{
vendor “STARWIND”
product “STARWIND*”
path_grouping_policy multibus
path_checker “tur”
failback immediate
path_selector “round-robin 0”
rr_min_io 3
rr_weight uniform
hardware_handler “1 alua”
}
}
defaults {
polling_interval 2
path_selector “round-robin 0”
path_grouping_policy multibus
uid_attribute ID_SERIAL
rr_min_io 100
failback immediate
no_path_retry queue
user_friendly_names yes
}
Now, let’s discover our NVMe-oF storage:
Nvme discover -t tcp -a <CVM_Data_Network_IP_Adress> -s 8009
QUICK NOTE: You need to use the IPs of “Data” networks from all partnered CVMs so you have multiple paths to storage.
After discovering the storage, let’s connect to it:
Nvme connect -t tcp -n <NQN_Of_NVMe-oF_target> -a <CVM_Data_Network_IP_Adress> -s 8009
QUICK NOTE: You need to use the IPs of “Data” networks from all partnered CVMs to have redundancy in case one of the nodes has issues with the local storage.
The NQN name of the NVMe-oF target can be found in the CVM Web-UI in the properties of our NVMe LUN. To locate it, follow these quick steps:
8.1. Go to the CVM’s Web UI. Then, navigate to the “LUNs” tab, select your NVMe-oF HA device, and click “Manage LUN” button:
8.2. Navigate to LUN Settings and find the NQN:
Check if you have connected the storage by running the following command:
nvme list
To keep the connectivity with the NVMe-oF storage after reboot, run the next commands:
echo “discover -t tcp -a <CVM_Data_Network_IP_Adress> -s 8009” | tee -a /etc/nvme/discovery.conf
IMPORTANT: run the above command for all Data network IPs of both CVMs.
systemctl enable nvmf-autoconnect.service
After those steps, we create an LVM volume group on our NVMe-oF storage device:
vgcreate nvme_vg /dev/nvme0n1
Navigate to Proxmox VE UI to configure LVM volume group as the cluster storage. Select your “Datacenter”, navigate to the “Storage” tab, then click the “Add” button, and select “LVM.”:
13. After that, specify the name of the cluster storage and select your NVMe volume group. Then, select all the Proxmox cluster members that need access to the storage, check the “Shared” checkbox and click add:
And there you have it! You have successfully created the NVMe-oF HA storage device, which is now ready to host those I/O-hungry VMs.
Conclusion
If you are reading this part, you are already aware of a fast alternative for your iSCSI storage: NVMe-oF over TCP. The protocol is rapidly developing and is increasingly anticipated to become a standard in the industry. You can try it now to see what it’s capable of and decide whether you need it or not. Additionally, NVMe-oF over RDMA will be available in StarWind VSAN later this year, bringing more options for NVMe-oF implementation and additional management features.