Resilient File System (ReFS): Overview

This is a short overview of Microsoft Resilient File System, or ReFS. It introduces the subject and gives a short insight into its main characteristics and theoretical use. It is a part of a series of posts dedicated to ReFS and is, basically, an introduction to the practical posts. All the experiments that show how ReFS really performs, are also listed in the blog. ReFS seems to be a great replacement for the NTFS and its resilience is most convenient for cases, when data loss is critically unacceptable. The file system cooperates with Microsoft Storage Spaces Direct in order to perform automatic corruption repairs, without any attention of the user.

ReFS (Resilient File System – https://msdn.microsoft.com/en-us/library/windows/desktop/hh848060%28v=vs.85%29.aspx) is Microsoft’s proprietary file system introduced with Windows Server 2012 as a replacement for the NTFS. It features enhanced protection from common errors and silent data corruption. Basically, it is a file system that can repair corrupted files on the go if underlying storage is redundant a compatible. In case the damage cannot be fixed automatically, the “salvage” process is localized to the corruption spot, without causing volume downtime. ReFS provides metadata protection and optional data protection on a per-volume, per-directory, or per-file basis. Error-scanning is active, meaning a data-integrity scanner periodically goes through the volume, identifying latent corruption and triggering repair process. The file system is also highly-scalable, aiming at extreme disk capacities and file sizes. It supports files as big as 16 exabytes and maximum volume size (theoretical) of 1 yottabyte (one trillion terabytes).

We decided to run some research to see how ReFS works. In the next tests we’ll see how well the Resilient File System performs under typical virtualization workload. In order to simplify the articles and make them less bulky, we’ve divided the tests into parts. The first one here a research into IO behavior in ReFS with FileIntegrity option off and on. Our idea is to see how the scanning and repairing processes affect it. Then we’ll see how it maintains performance and what exactly influences it. You can check it out here https://slog.starwindsoftware.com/refs-virtualization-workloads-test-part-1/.

In the second part we’ll see how ReFS maintains performance and what exactly influences it. There is a common misconception, that hashsumming has a huge impact on ReFS performance, so we’re determined to show how “huge” this impact really is and what really happens. Our own bet is on FileIntegrity option, which may somehow disrupt I/O.

ReFS: Overview