Name resolution
In a Veeam, dedicated and secured backup fabric, we need DNS to ensure our backups and restores work well. Ideally, you do not depend on any of the DNS Servers you protect. Those might very well be the servers you need to restore. Missing this independent DNS component in your Veeam Backup Fabric design can lead to unpleasant surprises at a very inconvenient moment.
DNS in a Work Group
In a secured Veeam backup fabric that runs in a works group and not in its smaller Active Directory domain, we cannot rely on the DNS servers needed for Active Directory.
Next, we disable NetBIOS over TCP/IP, LMHOST lookup, and LLMR. That means that only host files and DNS are left. We can still use IP addresses without name resolution, which can be cumbersome. When talking about protecting Hyper-V clusters, it is necessary to be able to use fully qualified domain names (FQDN). It is also handy to have name resolution when doing regular restore testing and validation instead of having to look up the IP address.
Not using FQDNs/DNS
When it comes to FQDN usage, we have multiple options to consider. Not using DNS is one of them, which means we must maintain a host file with all that info. That process becomes tedious in both small and large and dynamic environments. As it requires manual interventions, when there are changes, drift will occur, and with that, issues will follow. Doing everything by IP isn’t that attractive at scale in the long run, either.
Use the DNS servers in the protected environment
As said before, you can use the domain controller (s) ‘s DNS server (s) in the environment you are protecting. It is easy and does not require any additional maintenance. During normal operations or restores, this will work just fine. If some ransomware wipes out all existing servers, you will make a wholesale restore to redeployed fabric hosts. Maybe they have the same names and IP addresses, or perhaps not.
Using one or two dedicated secondary DNS servers
Many restore scenarios exist, big and small, where having the FQDN/IP DNS information makes things easier and better. When it is the DNS server you are restoring, you can’t look up information in it. And yes, shops still have only one AD/DNS server. Having the DNS records available during a wholesale recovery might also be handy. Name resolution is also very convenient for file- and item-level recovery operations with any of the Veeam explorers. In case you missed it, terminology like “handy” and “convenient” are understatements.
I prefer to set up a separate secondary DNS server. The reason for using a secondary DNS server is to be independent of the environment we protect. When I do this, I prefer to run it on the VBR server. You can point all VBR infra servers to it for DNS. This DNS server is configured as a secondary DNS server that can do zone transfers from the DNS server zones it protects. You could keep using your standard DNS servers and only swap the secondary DNS Sever to it when you have lost the DNS server(s) in the environment you are protecting. But I don’t do that as I don’t see the point. You are better off using that dedicated DNS Server to ensure it works correctly, and you don’t have to move to it when needed. Everything is set as it should be under all conditions.
The picture below shows the forward lookup zone and the reverse lookup zone for my VBR secondary DNS server with two inserts depicting the fact that I allowed zone transfers to the VBR server (192.168.2.72) on my lab domain controllers with Active Directory-integrated DNS.
If you need the Veeam infrastructure server’s management IP address registered in DNS, you could point the VBR server to a primary or AD-integrated DNS server(s). But unless you allow insecure DNS updates, you must manually add those records. Remember that the Veeam backup infrastructure hosts are not domain-joined. You should have “secure only” DNS updates enabled on Active Directory-integrated DNS, which means the non-domain joined Veeam backup infrastructure hosts cannot register as they need a KRB token. Having DHCP take care of that is also not on the table as, again, you don’t want your backup fabric to depend on DHCP in the protected environment. So, in the end, it is not a great option to use the DNS for the workload you are protecting for Veeam. It is, in my opinion, a much-overlooked design problem.
The Veeam Backup & Replication Server has multiple NICs. That is because the Veeam Backup Fabric leverages multiple NICs for dedicated backup and restore traffic (10Gbps in the past, now often 25Gbps and higher). As described in my blog post. This can cause issues at system startup where the DNS service fails to bind to the network interface, and you have no name resolution. That means your backup jobs will fail.
We configure DNS to listen only to the management IP address to avoid DNS issues. While this usually fixes the problem, it sometimes still occurs. While rare, it is no good when it happens as Veeam will fail to find and update the registered servers and cluster and have issues with backups and restores.
To fix this issue, no matter how rare it is, when it occurs, I wrote a script that I run in Task Scheduler after startup and from then every 5 minutes.
It tests name resolution, and if that fails, it looks for event IDs 404, 407, and 408. If these are present within 15 minutes of the current time, indicating a DNS issue after a server reboot, it restarts the DNS service and should fix the problem. That is what the script does. If another event ID indicates a separate issue, it will log this with the note that you need to fix it yourself.
The script also logs other DNS error events on top of its own actions to the application logs.
The example script can also be found on GitHub Public/PowerShell/PoShOnPrem/CheckVeeamFabricDnsService.ps1 at master · WorkingHardInIT/Public (github.com)
There is also a sample XML file with the Windows Task Scheduler implementation. Feel free to download it to see how to set that up or set up the job. Don’t forget to adapt the file path and file name when required.
Below, you will see the job in the Task Scheduler.
Conclusion
I hope this helps some of you who might run into this issue. When you have set up a Windows DNS server correctly on a multihomed system and did not mix the DNS server role with RRAS/NAT, you can still, once in a while, see DNS name resolution failures related to event IDs 404, 407, and 408. This is a rare occurrence, and it should not dissuade you from using a dedicated DNS Server in your Veeam Backup Fabric. With this script, you ensure that when it does occur, the system will heal, and your backups and restores will have DNS services to work correctly. Logging the script findings and actions in the event log provides an easy way to monitor the occurrences when the issue was fixed for you.