Introduction
There are lots of great materials on optimization of virtualized environments that I think many of you will enjoy reading about. Such topics are all over the IT community and they cover a wide range of the technical questions. This article will focus on the matter which is not quite clear yet, especially when it comes from theory to practice. It’s about Windows Disk write cache feature and its implications for data consistency and performance of the virtual hard drives.
Problem
Disk write caching is designed to speed up system processes and applications by allowing them to proceed without waiting for data to be written to the disk. In other words, it enables them to continue operating, while in fact the data are still sitting in the cache and waiting until the underlying storage can accommodate it. That is, write request is getting acknowledged once the actual data is placed on cache instead of the targeted device. However, at the same time, there is a risk associated with this feature.
Disk write caching may lead to the loss of cached data
You have to be careful keeping it enabled since it may cause losing cached data in case of a software crash, equipment failure or sudden disconnection of the device. Microsoft therefore highly recommends acquiring UPS for the system to prevent any data integrity issues. It is also noteworthy that Windows itself controls these settings in some scenarios:
- Write cache is always disabled for all servers with the Domain controller role installed and it is set to “enabled” mode for such apps as Microsoft Exchange.
- There are some applications that use FILE_FLAG_WRITE_THROUGH and/or FILE_FLAG_NO_BUFFERING by default, so write cache can never be applied to them.
Research
Well, but how do we configure write caching policies in Windows environment? This can be done via Device Manager or Disk Management Wizard. You select a disk, click Properties and move on to Policies tab. You then will be given a choice: Better Performance (enable write cache) or Quick removal policy (disable cache).
In this article, we’ll mostly address performance aspect of Windows disk cache and see how it affects write and read speed of the virtual hard drives. But the reason to write about that is another article I stumbled across not long ago. It contains the authors’ analysis of the virtual disk performance using different write caching policies.
Briefly, he has found a modest improvement of write and read operations after disabling write cache (quick removal policy). And that was what really got me puzzled as well. So, it was agreed to deploy a test environment where one might reproduce that behavior. In reality, I needed a single hypervisor host and some disk array acting as physical storage for my test VM. To bring the whole configuration to the similar form, I’ve decided to start with VMware ESXi and small RAID 0 array with only 3 flash drives.
Lab specifications:
Test VM settings:
All benchmarks were performed using Microsoft DiskSpd (https://gallery.technet.microsoft.com/DiskSpd-a-robust-storage-6cd2f223) tool. Our test process consists of two random patterns that would help us to collect the basic performance metrics.
We are using the corresponding DiskSpd parameters:
4K 100% RANDOM WRITE:
1 |
diskspd.exe -t8 -b4K -r -w100 -o40 -d30 -h -L -c20G |
4K 100% RANDOM READ:
1 |
diskspd.exe -t8 -b4K -r -w0 -o40 -d30 -h -L -c20G |
Test results
100% Random Write (single drive)
100% Random Read (single drive)
We also decided to run the additional test against 5 targets to check if the trend is set to continue. In our case, it was 5 similar virtual disks attached to the same VM.
100% Random Write (5 drives)
Conclusion
We haven’t seen any performance boost after disabling write cache on virtual hard drives in our test lab. On the contrary, the results of the research study point in the opposite direction. But that minor difference in IOPS numbers you might have noticed is more like measurement deviation. I also had to run much benchmarking to get more-or-less consistent results that varied from test to test. In summary, though, I must agree that switching to Quick Removal policy is still a good idea, especially in virtualized environments. Just because of disk write cache doesn’t significantly affect the performance as it has been shown in practice, but by an accident, it may lead to very unpleasant experience with the losing data or data corruption issues. To complete the picture, I am going to benchmark that further on other hypervisors and on Azure VM. Also, it would be nice to test fast NVMe cards. And it’s really good excuse to continue the series of articles on Windows Disk write cache and its impact on performance.