Snapshots in VMware vSphere often cause various problems with configurations and performance, unless they are properly used – for live backup of virtual machines and temporary keeping VM configuration before the update.
However, using them in large infrastructures is unavoidable. At some point you may need to delete/consolidate virtual machine snapshots (Delete All button in Snapshot Manager), which is quite time-consuming and demanding in terms of storage performance. Thus it would be a good thing to know in advance how much time it takes.
As a reminder, initialization of deleting snapshots in vSphere Client with Delete All function leads to their immediate deletion from GUI, but at the storage, this process takes much time. If an error occurs during deletion, then snapshot files may stay there on the storage. In this case, snapshots consolidation function should be used (Consolidate option in context menu):
Deleting snapshots, both with Delete All button and with Consolidate function, is called consolidation.
First, let’s see factors that affect time of virtual machine snapshots consolidation process:
- Delta disks size is obviously the most important characteristic. The more data on the delta disk, the longer it should be applied to the base disk.
- Number of snapshots (number of delta files) and their size. The more snapshots, the more metadata is available to analyze before consolidation. Moreover, when there are several snapshots, the consolidation goes in several stages.
- Storage subsystem performance, including FC-fabric, Storage Processors, LUNs (amount of disks in the group, RAID type etc.).
- Data type in snapshots files (zeros or random data).
- ESXi host-server load during snapshot taking.
- Storage subsystem load by virtual machine during consolidation. For example, on a fully-loaded mail server, the snapshots consolidation process will take much time.
It should be mentioned here, that consolidation process is very demanding towards I/O subsystem, so it is not recommended to make it during working hours when production virtual machines are loaded.
So, here are the ways to evaluate performance of snapshots consolidation process.
Check the I/O performance of the storage where VM with the snapshots is located.
In order to do it, only one test virtual machine with the snapshots should remain on the storage. Other machines can be temporarily removed from with vMotion/Storage vMotion.
- First, check the snapshots file sizes through Datastore Browser or with the following command:
1 |
ls -lh /vmfs/volumes/DATASTORE_NAME/VM_NAME | grep -E "delta|sparse" |
- Sum up snapshots files sizes and write them down. Then, find the LUN where our test virtual machine is located (for further details see KB 1014953).
- Start performance monitoring utility:
# esxtop
- Press <u>key to switch to drive devices performance representation. To see the full name of the device press Shift + Land enter 36.
- Find the device where data store with the virtual machine is located and track the parameters in MBREAD/s and MBWRTN/s during snapshots consolidation. In order to have the necessary device at the top of the screen you can sort output by parameter MBREAD/s (press R key) or MBWRTN/s (press T).
Thus, knowing your read/record performance parameters, as well as the size of snapshots and consolidation time of the test case, you can evaluate snapshots consolidation time for other virtual machines (however, of nearly identic disk subsystem load profile).
Let’s see performance of a certain snapshots consolidation process.
This is a subtle process that can be used to evaluate snapshot time by monitoring the vmx process, which performs operations with snapshot in server memory.
- Start performance monitoring command:
# esxtop
- Press Shift+ V to see only running virtual machines.
- Find the VM, where consolidation is performed.
- Press <e>to open the list.
- Enter Group World ID (it is a value in GID column).
6.Memorize World ID (for ESXi 5.x the process is called vmx-SnapshotVMX, for earlier versions it’s SnapshotVMXCombiner)
- Press <u>to display drive device statistics.
- Press <e>to open the list and enter the device to which VMX records consolidation process. That’s something like naa.xxx.
- Follow the process through World ID mentioned in item 6. You can sort output by parameters MBREAD/s (R key) or MBWRTN/s (T key).
- Track the mid-value in MBWRTN/s column.
This one is more exact evaluation method, and it can be used even at light storage load from other virtual machines.