Search
StarWind is a hyperconverged (HCI) vendor with focus on Enterprise ROBO, SMB & Edge

Configuring Highly Available NVMe-oF Storage in Proxmox VE

  • July 11, 2024
  • 21 min read
StarWind Solutions Architect. Vladyslav has a broad expertise in virtualization technologies, and a strong background in storage and system administration.
StarWind Solutions Architect. Vladyslav has a broad expertise in virtualization technologies, and a strong background in storage and system administration.

Hello, fellow IT and virtualization gurus! Today, we’re diving into the world of fast storage. NVMe storage is becoming increasingly popular and in demand in our rapidly evolving industry. However, with the rise of NVMe storage, the IT world faced the challenge of fully leveraging its capabilities. Traditional storage protocols like iSCSI couldn’t unlock the full potential of NVMe storage. To address this, the NVMe-oF (NVMe over Fabrics) storage protocol was created, enabling us to harness NVMe’s capabilities to the fullest.

NVMe-oF is now leading the way in providing low-latency, ultra-performance storage, completely surpassing the ‘old school’ iSCSI. StarWind is at the forefront of NVMe adoption and is currently the only vendor offering a production-ready NVMe Initiator for Windows.

So, let’s dive into the details and explore what makes NVMe-oF a game-changer and how StarWind is leading the charge in this exciting space.

NVMe-oF vs iSCSI

We’re all familiar with iSCSI—a proven, reliable, and stable storage protocol that has served us well for decades. We appreciate iSCSI for its balance of performance and simplicity; it’s like an old sports car—good-looking, reasonably fast, and fun to drive. However, just like an old sports car, iSCSI has its limitations. You can’t tune it for much more performance compared to modern vehicles (and it doesn’t have Apple CarPlay either!). iSCSI is built on an older transport protocol, which shows its age, especially when paired with cutting-edge hardware like PCIe 4 and 5 NVMe drives, modern CPUs, and network cards. The entire iSCSI software stack, including OS drivers, initiators, and targets, wasn’t designed to handle the super-high I/O command parallelism that modern hardware supports.

NVMe-oF, on the other hand, is a modern protocol designed with today’s hardware in mind, free from the performance limitations of iSCSI. It offers ultra-low latency, exceptional CPU efficiency, and immense performance—just a few of the benefits when using NVMe storage with the NVMe-oF protocol. However, unlike iSCSI, NVMe-oF can be more complex to configure properly. Setting up a stable, reliable, and highly available storage configuration without performance loss can be challenging.

You might already be thinking about the hurdles you’ll face when implementing this in your production environment. Don’t worry — StarWind has you covered with its StarWind VSAN software, providing a solution that simplifies the complexity of high-availability NVMe-oF deployment.

NVMe-oF over TCP vs NVMe-oF over RDMA

One of the key advantages of the NVMe-oF protocol is its versatility, offering two configuration options depending on your network equipment: NVMe-oF over TCP and NVMe-oF over RDMA. This flexibility significantly broadens the range of environments where the protocol can be successfully implemented, ensuring that more companies can benefit from its performance enhancements. Let’s explore how these options can be leveraged with StarWind VSAN software:

Option 1: NVMe-oF over TCP

The first option is NVMe-oF over TCP. This configuration is faster than iSCSI and uses the standard TCP protocol as the transport protocol, allowing the use of commodity hardware. This means you can seamlessly replace iSCSI with NVMe-oF without reconfiguring your servers or replacing any components. You’ll largely benefit from lower latency and reduced CPU utilization on your all-flash storage even if it’s not NVM-e.

In our simplified (and not fully redundant) 2-node hyperconverged NVMe-oF over TCP configuration, we use one dedicated port for the replication network and another for NVMe-oF data traffic. The data traffic port and the management network handle the heartbeat, while the data traffic port also functions as a listener.

Example of NVMe-oF over TCP configuration with StarWind VSAN

Figure 1. Example of NVMe-oF over TCP configuration with StarWind VSAN

 

As you can see, it is a very straightforward configuration. However, you can add extra “NVMe-oF Partner” and “Replication” connections to achieve a fully redundant configuration.

Option 2: NVMe-oF over RDMA

NVMe-oF over RDMA configuration allows you to maximize your storage performance, offering even lower latency than NVMe-oF over TCP. This results in ultra-fast storage capable of handling any type of workload. However, this configuration requires RDMA-capable network adapters.

In this simplified RDMA with SR-IOV configuration, we use one network port for replication traffic and another dedicated port for local NVMe-oF data network. The management network handles the heartbeat. Both ports on each node (for replication and local network) are used as listeners.

 

Example of NVMe-oF over RDMA configuration with StarWind VSAN

Figure 2. Example of NVMe-oF over RDMA configuration with StarWind VSAN

 

In both configurations, complete redundancy requires adding additional ‘Replication’ and ‘NVMe-oF Data’ connections.

In this article, I will start by showcasing the first configuration: StarWind VSAN with NVMe-oF over TCP. The RDMA-enabled setup will be covered in my next articles.

StarWind VSAN NVMe-of storage device configuration

StarWind VSAN supports a variety of hypervisors, so you can find one that meets your requirements. Due to the increased popularity following the Broadcom-VMware deal, today, I will show you how to configure StarWind NVMe-oF HA on Proxmox VE.

Assumptions and considerations

StarWind VSAN is deployed as the Controller Virtual Machine (CVM). The deployment steps for the StarWind CVM, CVM network configuration, and storage addition options can be found in this Configuration Guide. The following steps assume that you have successfully deployed StarWind VSAN CVM on both cluster nodes and completed the Initial Configuration Wizard.

Configuring HA storage

The steps for configuring highly available NVMe-oF storage for Proxmox VE are as follows:

Ensure all the disks are correctly added to each CVM and are ready to be used.

Log into each StarWind VSAN CVM Web-UI and navigate to the “Physical Disks” tab. You should see your NVMe disks listed there:

Log into each StarWind VSAN CVM Web-UI and navigate to the “Physical Disks” tab

3. Now, to create a storage pool, navigate to the “Storage Pool” tab and open the storage configuration wizard:

create a storage pool, navigate to the “Storage Pool” tab and open the storage configuration wizard

4. Read the guidelines and press “Next”:

Read the guidelines and press “Next”

5. Next, select the CVM you are willing to create a storage pool on and press “Next”:

select the CVM you are willing to create a storage pool on and press “Next”

6. Now, select the drives to add into the storage pool:

select the drives to add into the storage pool

7. Select the RAID type for configuration and press “Next”. You will have two automatic configuration presets and one manual option:

‘High Capacity’ represents a recommended RAID 5 configuration.

‘High Performance’ represents a recommended RAID 1/RAID 10 configuration, depending on the number of selected drives.

‘Manual’ allows you to configure the desired type of storage pool with all the parameters.

 

Select the RAID type for configuration and press “Next”. You will have two automatic configuration presets and one manual option

8. On the “Summary” step, check the specified parameters and press “Create” to start initializing the storage pool:

check the specified parameters and press “Create” to start initializing the storage pool

QUICK NOTE: You can select both CVMs simultaneously to complete the configuration faster or configure each side separately. I’ve chosen the latter as it’s more convenient for me.

After creating the storage pool, it is time to create the local volumes that will be used for our HA storage.

9. Navigate to the “Volumes” tab and open the configuration wizard:

Navigate to the “Volumes” tab and open the configuration wizard

10. In the Create volume wizard, select both CVMs and press “Next”:

Create volume wizard, select both CVMs and press “Next”

QUICK NOTE: Like with the storage pools, you can select partnered CVMs simultaneously to complete the configuration faster or configure each side separately.

11. Assign the name for both volumes, specify their capacity, and press “Next”:

Assign the name for both volumes, specify their capacity, and press “Next”

12. Select the optimal filesystem settings for your volume. For proper NVMe-oF configuration, we don’t need a filesystem, so choose ‘Raw’ and press “Next”.

Select the optimal filesystem settings for your volume

13. The final step is to check that we picked the right configuration parameters and press “Create”:

check that we picked the right configuration parameters and press “Create”

14. The newly created volumes will be shown as ‘Unclaimed.’ This is expected, so don’t worry about that status.

The newly created volumes will be shown as 'Unclaimed.

15. The final part of the NVMe-oF HA configuration is creating an HA LUN. Navigate to the “LUNs” tab and open the “Create LUN” wizard:

Navigate to the “LUNs” tab and open the "Create LUN” wizard

16. First, we need to select the protocol. In our case, it is obviously NVMe-oF:

select the protocol. In our case, it is obviously NVMe-oF

17. Choose the LUN availability option. In our case, we want to protect the data, so select ‘High availability’:

Choose the LUN availability option. In our case, we want to protect the data, so select 'High availability'

18. Then, select both CVMs to configure the replication between them:

select both CVMs to configure the replication between them

19. Since we only have a single volume on each machine, we don’t need to select anything else. Press “Next”:

Since we only have a single volume on each machine, we don’t need to select anything else. Press “Next”

20. Choose the failover strategy. In this version of StarWind VSAN, only one option is available for NVMe-oF setups: ‘Heartbeat.’ Additional options will be added in future updates. Press ‘Next.’:

Choose the failover strategy. In this version of StarWind VSAN, only one option is available for NVMe-oF setups: 'Heartbeat.'

21. Specify the LUN name and select the networks to be used for discovery and further connectivity with NVMe-oF HA storage. Choose the adapters previously configured as the ‘Data’ networks.:

 Specify the LUN name and select the networks to be used for discovery and further connectivity with NVMe-oF HA storage

22. Now, double-check the summary of configuration parameters and press “Create LUN” button:

Now, double-check the summary of configuration parameters and press “Create LUN” button

Great! At this point, we are done with the StarWind VSAN portion. Now, we need to discover and connect our NVMe-oF storage device to Proxmox VE.

We start by installing the nvme-cli package and configuring the nvme_tcp kernel module to be injected at boot on each Proxmox host. Proxmox is not shipped with those packages by default, and NVMe-oF functionality is also not included in the Proxmox VE Web UI. So, we need to do everything from the shell terminal.

The packages installation and kernel module configuration commands are the following:

Packages

apt update && apt -y install nvme-cli

NVMe kernel module

modprobe nvme_tcp && echo “nvme_tcp” > /etc/modules-load.d/nvme_tcp.conf

Now, we need to install multipathing tools by running the following command:

apt-get install multipath-tools

The next step is to create the configuration file for the multipathing tools:

touch /etc/multipath.conf

Now, edit the configuration file with any editor tool like “nano”:

nano /etc/multipath.conf

QUICK NOTE: “nano” editor was used as the command example.

Add the following content to the “multipath.conf” file:

devices{

device{

vendor “STARWIND”

product “STARWIND*”

path_grouping_policy multibus

path_checker “tur”

failback immediate

path_selector “round-robin 0”

rr_min_io 3

rr_weight uniform

hardware_handler “1 alua”

}

}

defaults {

polling_interval 2

path_selector “round-robin 0”

path_grouping_policy multibus

uid_attribute ID_SERIAL

rr_min_io 100

failback immediate

no_path_retry queue

user_friendly_names yes

}

Now, let’s discover our NVMe-oF storage:

Nvme discover -t tcp -a <CVM_Data_Network_IP_Adress> -s 8009

QUICK NOTE: You need to use the IPs of “Data” networks from all partnered CVMs so you have multiple paths to storage.

After discovering the storage, let’s connect to it:

Nvme connect -t tcp -n <NQN_Of_NVMe-oF_target> -a <CVM_Data_Network_IP_Adress> -s 8009

QUICK NOTE: You need to use the IPs of “Data” networks from all partnered CVMs to have redundancy in case one of the nodes has issues with the local storage.

The NQN name of the NVMe-oF target can be found in the CVM Web-UI in the properties of our NVMe LUN. To locate it, follow these quick steps:

8.1. Go to the CVM’s Web UI. Then, navigate to the “LUNs” tab, select your NVMe-oF HA device, and click “Manage LUN” button:

CVM’s Web UI | Then, navigate to the “LUNs” tab, select your NVMe-oF HA device, and click “Manage LUN” button

 

8.2. Navigate to LUN Settings and find the NQN:

Navigate to LUN Settings and find the NQN

Check if you have connected the storage by running the following command:

nvme list

 

To keep the connectivity with the NVMe-oF storage after reboot, run the next commands:

echo “discover -t tcp -a <CVM_Data_Network_IP_Adress> -s 8009” | tee -a /etc/nvme/discovery.conf

IMPORTANT: run the above command for all Data network IPs of both CVMs.

systemctl enable nvmf-autoconnect.service

After those steps, we create an LVM volume group on our NVMe-oF storage device:

vgcreate nvme_vg /dev/nvme0n1

Navigate to Proxmox VE UI to configure LVM volume group as the cluster storage. Select your “Datacenter”, navigate to the “Storage” tab, then click the “Add” button, and select “LVM.”:

Navigate to Proxmox VE UI to configure LVM volume group as the cluster storage

 

13. After that, specify the name of the cluster storage and select your NVMe volume group. Then, select all the Proxmox cluster members that need access to the storage, check the “Shared” checkbox and click add:

specify the name of the cluster storage and select your NVMe volume group

 

And there you have it! You have successfully created the NVMe-oF HA storage device, which is now ready to host those I/O-hungry VMs.

Conclusion

If you are reading this part, you are already aware of a fast alternative for your iSCSI storage: NVMe-oF over TCP. The protocol is rapidly developing and is increasingly anticipated to become a standard in the industry. You can try it now to see what it’s capable of and decide whether you need it or not. Additionally, NVMe-oF over RDMA will be available in StarWind VSAN later this year, bringing more options for NVMe-oF implementation and additional management features.

Hey! Found Vladyslav’s insights useful? Looking for a cost-effective, high-performance, and easy-to-use hyperconverged platform?
Taras Shved
Taras Shved StarWind HCI Appliance Product Manager
Look no further! StarWind HCI Appliance (HCA) is a plug-and-play solution that combines compute, storage, networking, and virtualization software into a single easy-to-use hyperconverged platform. It's designed to significantly trim your IT costs and save valuable time. Interested in learning more? Book your StarWind HCA demo now to see it in action!