Introduction: Hyper-V Replica – what’s the point?
Today I’m planning on telling you as much as I know about virtual machine replication function in the Hyper-V environment with the current version of Windows Server 2019 OS as an example.
Let’s pinpoint the main issues we should cover now:
- First things first – replication, what is it, why do we use it;
- It would be best if we set certain aspects straight, such as defining what should be done before configuring hosts;
- Useful tips on configuring replication with inbuilt tools. It is detailed enough, with no unnecessary information.
- And, of course, I’ll explain how your replication process can reach maximum efficiency.
The very term “virtual machine replication” means creating and maintaining a VM copy from the source host on the side host.
Well, for starters, you should remember once and for all: replication IS NOT backup! Just as snapshot or RAID is not backup. Basically, nothing is backup except backup. Let me explain real quick why it isn’t backup. If the primary machine fails, you can turn on its replica in no time and continue working, yes. However, if it fails not due to mere error but as a result of complicated problems on the application or OS level, all of these troubles are probably present in the replicated machine as well, so starting to work with the replica won’t do you much good. I’m willing to bet serious money that in such a case, your replicated VM will work for, like, a couple of minutes tops and then will perish with the same symptoms as the original one. A lot of experienced engineers that have had their share in maintaining complex infrastructure solutions could have already experienced similar issues.
What I’m saying is that replication can lend you a hand in improving your fault tolerance strategy, but it’s no panacea.
Hyper-V Replica and Fault Tolerance?
Hyper-V Replica, first introduced in Windows Server 2012, is a mechanism of VM asynchronous replication. The minimal working configuration of Hyper-V Replica requires at least 2 hosts with active Hyper-V role, connected by a data channel. After you have replication of the specific VM configured, the copy of this particular VM is created on the target host, and the source host transfers all the changes happening with the original VM at a specified interval. In Windows Server 2012, the replication interval is 5 minutes; in Windows Server 2019, it can be either 30 seconds, 5 minutes, or 15 minutes. Now, in case the original VM isn’t available, whether it would be host failure or anything else, Hyper-V Replica is to switch to the replica (the process is automated with the help of Replica Broker). The side host meanwhile starts the replicated VM, and all that is left to do is to transfer clients to this VM, which can have a different IP address, VLAN settings, etc. However, the specifics of the asynchronous replication in case of an unplanned failover are such that you may lose a certain amount of data if the changes were made after the last saved replication. If we’re talking planned failover due to the necessity of making some service maintenance changes on the primary site or source host, Hyper-V Replica will export all the changes made after the last replication and only then will shut down the original VM and switch to the replica.
Hyper-V: Pros & Cons
Let me tell you real quick why do I enjoy working with Hyper-V. As you probably can tell by now, I’m quite fond of this hypervisor, which is why I’ve decided to dedicate some time to evaluate the necessity of installation and initial configuration for further use. Hyper-V in general and Free Hyper-V in particular share the following benefits:
- They support the most common OS. There are no problems with compatibility whatsoever, so you can forget about installing some additional drivers or tools for the system to work. All Windows systems support Hyper-V, just like the latest versions of the most widespread Linux distributives and FreeBSD core (version 10 or higher). To put it simpler, all you need is a boot disk.
- A lot of options for VM backup. It’s either simple scripts, free apps, or fully-fledged backup software from the popular vendors. You might want to keep that in mind, as it is one of the primary Hyper-V benefits.
- Standard Control Panel that you can install on a machine operated by Windows. Only now it has a remote access option with the assistance of Windows Admin Center. We’ll talk about it later.
- Hyper-V Server is based on a pretty much conventional Windows server system, intuitively open for understanding. If you need to upload or download files, you just need to share directory on the hypervisor as if you were doing it on any device that is operated by a Windows system.
- Hyper-V supports Software RAIDs, such as inbuilt Intel RAID controller or you can build a Software RAID by yourself with OS Windows tools. Of course, today, SSD is moving up the ladder very fast, and my personal opinion is that RAIDs are becoming an item of ancient history: the future belongs to SDS!
- A fully operational free version, which, however, is without comfortable management tools, cause why would admins will ever need GUI, am I right?
- It’s easy to work with snapshots from the box. You don’t have to bother yourself with different file formats as VMware users do. Hyper-V has only one, and it supports snapshots.
As a process, you can perform Hyper-V virtual machine replication in three ways:
- Inbuilt hypervisor tools
- Outside software
- The most interesting, fast, and, as it usually goes, the most expensive method
As I have already mentioned before, we’re going with option number one, namely Hyper-V virtual machines replication with the use of Windows Server 2019 inbuilt tools.
So, Microsoft has promised us:
- Replication with asynchronous copying of changing data from the source machine to the replicated one. Asynchronous copying suggests copying not immediately after the data has changed but in selected amounts of time, which allows lowering the loads on the network data channel. The default policy in Windows Server 2019 establishes the minimal replication interval to be 30 seconds.
- To replicate Hyper-V virtual machines, you won’t need any specific storage or similar equipment on the source and replica.
- You can replicate everything you can virtualize.
- Replication is going through ordinary IP networks. You can encrypt the traffic as well.
- In the case with Hyper-V, the replication goes not only between the hosts but between the clusters as well.
- It doesn’t matter where exactly replicated hosts are, whether it would be different networks or different domains.
Checklist
- First things first. It’s quite obvious, but you’d be surprised to find out how many people forget to do it: check if the replica host goes to has compatible hardware;
Yeah, I know, I know. Engineers can make anything work on everything, but believe me, it’s a totally different ball game. Your hardware MUST support virtualization. Most of it does anyways today, but it never hurts to double-check. - More and more obvious things. Check for space in the place of destination and how fast it works. If you employ HDD from the Dark Ages of the computing era, be ready that things won’t get any brighter speed-wise. PLEASE, use the SSD.
- As a logical consequence of the previous suggestion, calculate the replication frequency of an average virtual machine and based on that, check how much of disk space each replication point in time (PIT) will take and how much of it you can actually afford. My practical advice, drawing from experience, – no more than 12 points.
- Of course, we’ll be replicating VMs from Hyper-V cluster. That’s why you’ll have to install and configure the Replica Broker role in the cluster. If there’s a cluster on the other side too – repeat the same action.
- Verify firewalls and routing all along the way between the hosts. It’s simpler with firewalls: port 80 for Kerberos over HTTP and 443 for certificate-based over HTTPS. Naturally, you’ll be able to switch ports during configuration.
- In case you want to encrypt your traffic, make sure you have all the certificates distributed between the parties involved beforehand. Don’t forget to check them for expiration dates, though, and if you’re using self-signed certificates, mark them as trusted.
- Check your VMs for unnecessary VHD. The chances are that you’re going to find a lot of data that doesn’t need to be replicated at all! That’ll save you some disk space, lower network traffic, and improve your overall situation with the network equipment load.
- Choose the time for the first replication wisely. It’ll take the whole machine, so it’s probably wise to make it happen outside the working time. If the replica host goes is outside the local network, the night or even the weekend might not be enough to finish, and you’ll end up with a VERY overloaded data channel or host. I will tell you how to avoid that a little bit later.
- I feel a little weird by telling you that, but try to avoid WiFi connection. I know, its twenty-first century and all, but for the sake of reliability of connection, let’s do it the old way.
- Do keep in mind that in Hyper-V, there’s a possibility of replication not only between the hosts but between intermediate servers as well. Yep, its Inception scenario.
- Evaluate your current backup plan in terms of compatibility with the replication plan. I think you’ll hardly like it when on top of backup, you’d have replication started as well. Your host may not forgive you for that! Also, it’s hardly a good idea to replicate a version of the virtual machine BEFORE failover.
- It would be fair to mention as well that Microsoft offers a tool that allows you to calculate with little affordable error how much exactly resources you’ll use for replicating one particular virtual machine. In Windows Server 2012 R2 it was called Capacity Planner for Hyper-V Replica
In 2019, Windows Server 2019 has been introduced to the new technology of replication directly to Azure cloud, so Microsoft offered a new tool for evaluating resources and possibilities – Azure Site Recovery. This replication option follows two primary stages, which are profiling and keeping records. The third choice suggests calculating sufficient bandwidth. That’s just what you need.
Minimum requirements to a server are as follows:
- Operating system: Windows Server 2016, Windows Server 2012 R2, or Windows Server 2019;
- Machine configuration: 8 vCPUs, 16 GB RAM, 300 GB HDD;
You can find more details about this tool here and here.
Of course, you won’t find out the exact number of IOPS, or network, and CPU load, but as an instrument of evaluation, it does well. That’ll allow preparing your infrastructure beforehand. Upon starting, you’ll be asked to select the primary server, replication server, and to set calculation time. You can increase the range from 30 minutes to 1 hour. During work. While running high load apps. Showing your boss how much your environment needs new toys, I mean, improvement. Just saying.
Configuring Hyper-V Replica. Yes, it is time!
So, here it comes. Certificates – check, network configuration – check, active Hyper-V role – check, management tools – check, and finally, we can start it up. First of all, give the replica host rights to accept VMs. This is achieved by using simple Hyper-V settings:
- Start Hyper-V Manager.
- Select the necessary host in Hyper-V Manager and select its settings.
- Select Replica Configuration.
- Select Enabled as a Replica server; also, select the replication method, either Kerberos (HTTP) or more safe way – certificates (HTTPS) from the certification center or self-signed.
- Furthermore, you can select those servers that can send replicas and their location. Any authenticated can send replicas by default.
- Don’t forget to make sure that Windows Firewall on the replica host makes an exception for Hyper-V Replica HTTP Listener or Hyper-V Replica HTTPS Listener (TCP-In) – depending on your requirements.
All settings are quite clear, but I would like to highlight Authorization and storage.
Not that it was crucial for the process, but if I were you, I would select only specific hosts or the groups of hosts as allowed for replication. It doesn’t happen very often, but it does sometimes happen that a wrong host is replicated. And you can only hope if its a host to spare, and you’re not just overloading primary work storage, with all corresponding consequences to follow. Let’s follow the famous attic theory by Sherlock Holmes and not to stock our storage with unnecessary furniture.
So, since the cluster is up and running, now it’s the time to start Hyper-V Replica Broker role. No cluster? Just skip this part, then. The activation part is essentially simple: five clicks on Next, one on Finish. I don’t think anybody will need this aspect to be explained further.
So, why don’t we go to Failover Cluster Manager instead, select Configure Role, follow the steps, and not forget to give NETBIOS a compatible name and enter IP?
Everything I’ve just described is open for business from the Broker as well, with the only difference being that it applies for the whole cluster, sparing you the boredom of enabling replication on each separate host.
If you’re wondering what Replica Broker actually does, it’s easier to say what it doesn’t do: machines outside the High Availability cluster are beyond its reach. The rest of VMs, well, they fell under its jurisdiction completely. Replica Broker manages all replication and clusterization actions connected with the VMs inside the cluster, giving zero chances for a wrong availability decision to occur. Remember this like your own last name: from now on, all actions should be performed through Failover Cluster Manager, or there will be nothing to manage. For we all know – even if a meteorite falls on your working host, unlike dinosaurs, engineers and admins have a professional responsibility. The worst thing you can do is try and start replication with Hyper-V Manager.
Ready player one!
As a next step, a pretty standard setup wizard will ask you to specify the name of the Replica server and connection parameters. These actions are necessary; however, only if the hosts are not part of one domain; otherwise, all will be done automatically. Basically, the only really important step at this point is “Compress the data that is transmitted over the network.” You ought to go back a little to your strategic planning and decide whether it is more critical for you to get it over with as fast as you can, whatever the cost, or would it be more appropriate and safe to do everything in full and the host performance is the priority:
Specify disks that you want to be a part of the replication process. Important note: a disk NOT specified for replication won’t be replicated at all. If this disk is crucial for a virtual machine to work adequately, but its content is basically rubbish that you can live without, just create this disk on the replicated machine.
Then, back to plans so that you can decide upon the replication frequency. If you’re still using the 2012 server (for some reason), it will set you up with 5 minutes frequency without having the courtesy of asking you first. As the time was passing by, guys in Microsoft eventually realized that it’s a little bit rude, so in Server 2012 R2, you can choose from 30 seconds, 5 minutes, and 15 minutes. That is no God’s gift, but still.
By the way, if you want to choose 30 seconds frequency, you better make sure that your network connection, storage, and host performance are off the charts.
Next, configure recovery points and select VSS snapshots frequency. Overall, you can do great even without these precautions, but then again, you don’t want to lose the consistency of your data, especially if it fuels some really vital applications.
What the following screenshot suggests is that you ought to create a recovery point each hour, keep it for 24 hours (that’s maximum), to take a VSS snapshot every 4 hours.
As we have already established, for the first time you need to transmit the whole machine to the Replica server. If your VMs are too large or your network is simply unable of transmitting large portions of data, there are three ways on how to escape an impasse:
- Send an initial copy over the network. The basic default option, nothing to be said more.
- Send initial copy using external media. In my humble opinion, that’s the most interesting option. At the first moment of time, there will be a clone machine created and saved on the source server, named according to the <VMname_GUID> template. This very machine, furthermore, will be copied to the Replica server as a dummy. After that, save it on the external media and move it to the second server, but not to the replicated one. Machine clone now will be waiting for you to move actual data with the Import Initial Replica You’ll be asked to specify the location, and data will be transferred, and voila! It’s done. But don’t postpone this transition too much, because it may cause changes between the VMs.
- Use an existing virtual machine on the Replica server as the initial copy. The least possible option, but I’m glad it exists! You just use already existing VM as a basis. Its origins don’t matter, whether it would be the backup copy, the result of previous replication, etc. In general, you’ll use this one to transfer only data that is different. To be honest, I have never seen anyone perform this scenario.
Time to check what you’ve done and click on Finish! As it usually goes, everything will go fine, and you’ll be asked if you want to change the network parameters for replicas because replicas aren’t connected to any network by default settings.
Extended Replication
Like tons of other interesting options, the Extended Replication option has become available only in Windows Server 2012 R2. It enables you not to configure replication point to point but to build the whole chain as well. After replicating a VM to a second host, the replication of replica (yeah, I know) to another, third host, begins.
However, before jumping into this new option, let’s set some ground rules first:
- Extended replication frequency CANNOT be lower than the initial replication; if the initial replication frequency is 5 minutes, you cannot set the extended replication to 30 seconds frequency;
- VSS snapshot creation frequency isn’t up to change at all;
- You can’t change the list of disks participating in replication;
- What you can change is the means of authentication or way of sending the first replica.
Click on the replicated machine and select Extend Replication to get to a setup wizard. Further configuration is the exact same one we have already covered. Well, since we seem to be done with everything essential in configuring and preparing for the process, let’s talk about what to do with networks and possible Failover scenarios.
We Need to Talk about Safety
You all probably know that people prefer to connect replicas to an isolated network, not to the main one. It may seem odd at times, but hey, just because you’re paranoid doesn’t mean that they aren’t out to get you, right? Also, the fact is that sometimes an admin doesn’t really have a choice: the Replica server can have different subnets for which the replica will need different network parameters.
As you can see above, Hyper-V is offering you to set Advanced Features for each network adapter in case of the failover with Hyper-V Failover Cluster Manager.
Hyper-V Replica supports three types of Failover:
- Planned Failover
- Test Failover
- Unplanned Failover
Planned Failover
If you use the Planned Failover scenario, that probably means that you are aware of possible problems with the primary host coming your way. Like, it can be a power outage, shutting down the server for maintenance, another volcano just popped up near your data center, alien invasion, you name it. Anyway, there is a period of system downtime for the time you need to shut down the main machine and turn on its replica. However, I don’t need to tell how switching to replica when it’s your decision is different from disaster. Here’s what happens:
1. You can do it only manually so that turning off by mistake is excluded from the list of options. Unless you do, the Failover Cluster Manager won’t get tired of notifying you that there’s an error.
2. Click right on the turned off VM and choose the Planned Failover
3. Reverse the replication direction after failover is not selected by default, and if you don’t want to lose your data gained in the failover mode, I suggest you change that. The primary host, however, should have permission to accept files from the replica, I already told you that, otherwise this whole thing won’t work.
4. Start the failover process and verify if the machine is available for users. The most common mistakes are typing the wrong IP or forgetting to write a DNS query message. Failover Cluster Manager checks neither IP address nor DNS, so it’s up to you.
Test Failover
It’s that rare case when it does exactly what you expect it to do. Checking replicas and backups from time to time is what helps admins sleep at night. And what is the best way to check replica than running it? From the looks of it, you may think that it’s just a fancy name for a Planned Failover, but this is Microsoft, so it’s not.
How does it work? You create a new VM on the replica site, and you can do whatever you want with it. For example, you can verify ports with telnet and be sure that services on these ports are up and running.
Unplanned Failover
The first rule of ̶F̶i̶g̶h̶t̶ ̶C̶l̶u̶Unplanned Failover is: You do not use Unplanned Failover. Unless it’s really necessary. In other words, if it’s not an actual failure, power outage, or whatever, – use Planned or Test options. If you really have to check whether it works or not, write or documentation for engineers, well, then run this scenario in a test environment. While you’re doing that, the only thing you get to choose is setting a recovery point. Further, the machine will be started, no matter what. Let’s take an example. When you’re running Planned Failover, Failover Cluster Manager won’t let you shoot yourself in the leg and start to similar machines – it will wait until the main machine is down. However, in our case here, the most you’ll get is one formal notification, albeit a strict one at that. As the final point of no return, you’ll have to finish the process with the PowerShell cmdlet Complete-VMFailover. After that, all the recovery points will be erased, and Unplanned Failover will be finished.
Tips and tricks
- Pick one day (two, if you can, will be even better) and spend it on tests and planning.
- Move a swap file of your machine to isolated VHDX and exclude it from the replication scenario. There’s no need to send it as well.
- If you have changed a disk size on your source machine, you ought to do the same on the replica.
- If you can’t use an isolated network for replication, use Network Throttling instead. As you should know, the replication process can consume the whole communication channel bandwidth, so QoS is the only reasonable choice here. It’s quite easy to set limitations for vmms.exe or selected ports.
- Here is a good tool to check the Iperf network bandwidth.
Conclusions
Well, I think that there’s nothing more to say except for the fact that the virtual machine replication between two Hyper-V hosts in Windows Server 2019 works fine enough. With a few simple actions, it’s a secure and transparent way to maintain your system fault-tolerant at all times. Be careful and watch your back!