Admins shut down their hosts for servicing from time to time. After closing for maintenance one node, vSAN cluster resources are to be re-distributed, and here Maintenance Mode comes into play. Today, I’d like to discuss the whole idea of maintenance mode and its options.
Enabling maintenance mode on a standalone host
Once the host enters maintenance mode, its icon and state in the Summary tab change.
This sign means that there’s no way to do any I/Os to that host. There are no active client network sessions either. For these two reasons, you cannot run or create VMs on that host (they all are basically shut down) while it is under maintenance.
The host can leave the maintenance mode automatically (after some process is finished), or per user request. Note that rebooting won’t put the host back to the normal functioning. While doing any updates with vSphere Update Manager though, there’s an option to exit maintenance mode after a reboot.
How to put a host in maintenance mode?
You can activate maintenance mode via vCenter.
For a standalone host, you can enter this mode in the web console too.
For CLI-minded users, there’s a way to enter the mode from an SSH session. Here are 3 commands that may come in handy:
- esxcli system maintenanceMode get informs whether maintenance mode is enabled.
- esxcli system maintenanceMode set –enable true enables maintenance mode on the host.
- esxcli system maintenanceMode set –enable false disables maintenance mode.
You can do just the same procedure from a vCenter instance with PowerCLI. Here are some commands:
- Connect-VIServer “My vCenter IP” -user “user@domain” -password “password” enables to connect to a vCenter Server instance.
- Get-VMHost -name “My ESXi host IP” informs about the current host state.
- Set-VMHost -VMHost “My ESXi host IP” -State “Maintenance” -RunAsync set the host into maintenance mode.
- Set-VMHost -VMHost “My ESXi host IP” -State “Connected” –RunAsync brings the host to the normal state.
- Disconnect-VIServer 172.16.10.5 -confirm:$false disconnects you from the vCenter Server instance.
Here are the command outputs.
How does it work in a vSAN cluster?
Before I move any further, I’d like to clarify the whole concept of maintenance mode for some host in a vSAN cluster. Long story short: By putting a host under maintenance, you, basically, disconnect it from the cluster. In other words, you temporarily remove the capacity and compute power from that cluster. Of course, this triggers workload distribution mechanisms, but you need to be really careful as your VMs may become a bit sluggish, or there may even be several stability threats.
While enabling maintenance mode on some host, there’s a message popping up, telling you about host maintenance mode options to mitigate risks:
1.Full data migration
2.Ensure accessibility
3.No data migration
Full data migration
That’s the right option if you have a strong feeling that the host is going to be shut down for a long time. Keep in mind that migration is an I/O-intense process that is associated with heavy network loads. So, set on the time when the migration won’t overlap with your production activity. Good news: The Enter Maintenance Mode wizard provides an approximate amount of data that has to be migrated (this parameter can be translated into time). Note that the host cannot be shut down until data transfer is over. The scheme below shows how data are evacuated.
Here’s how to start migrating all the data from the host.
Ensure accessibility
Ensure accessibility is the default maintenance mode option. It works fine when you shut down a host for a short time (i.e., updating ESXi, swapping some worn-out parts, etc.). Unlike the option discussed above, this one is not recommended when you expect to shut down a host for more than one day: there’s a good risk of performance and stability degradation.
This mode is the right balance between the stability and migration duration. There’s only partial migration done: just as few files as needed to ensure VM uptime while the server stays shut down. Here’s a scheme showing how the Ensure accessibility works. Red flags indicate data that are unavailable all the time when the node is stopped.
Note that enabling Ensure accessibility mode may lead to changing storage policies. The cluster temporarily uses some resources, so vSAN naturally tries to avoid any event of performance, stability, and data loss. The Enter Maintenance Mode wizard shows how much data need to be evacuated and how many objects may become incompatible with the new storage policy. It also indicates the amount of data that is to be transferred so that you can free some datastore space if needed.
By default, after you set the host in maintenance mode, there’s a 60-minute window before synchronization starts and new storage policies are applied to VMs. You can prolong that waiting time by setting a greater Object Repair Time value.
No data migration
As the name of this option implies, there’s no data migration done at all, meaning that it is the fastest way to put the host into maintenance mode. But, it’s the most dangerous one: there’s a risk that some VMs go down due to being pinned to the host that leaves the cluster.
If you are ready to take that risk, here’s how to put the host into maintenance mode with no data migration.
Conclusion
Maintenance Mode allows for managing vSphere environments in a more convenient way, nevertheless you should be aware of the risks associated with this feature. Well, I believe this article to cover maintenance mode good enough, providing an important info one should know before enabling it.