Whenever I try to approach some new technology I haven’t worked with before I’m trying to start with basic understanding and fundamentals of it instead of rushing on to setting up servers mindlessly so that I can end up in a situation all too familiar to many IT professionals: “here it is installed and configured yet I’m still not quite sure what it is” 😊
The other day I decided to explore SQL Server AlwaysOn feature and committed myself to write an article or two about this technology. Extra incentive there for me was also the fact than more and more K2 customers using AlwaysOn to protect their K2 database and I am getting more and more exposure to this feature in my day to day work. It turned out to be surprisingly difficult to have a grasp of what is AlwaysOn conceptually, just because as soon as you start reading up or configuring it you will see that it is rather amalgamation of different technologies which you’ve probably seen before separately, which were improved and joined together under new marketing term/name to provide you with comprehensive disaster recovery (DR) and high availability (HA) solution which gives you more than SQL Server Failover Clustering (FC) combined with deprecated SQL mirroring feature. Though in case you have tons of experience with these two you probably don’t need any introduction to AlwaysOn either.
I guess once you will read through this article you will also notice how this technology is shaped out of familiar building blocks, but what I also hope for is that you will get a grasp of what is AlwaysOn, and get a basic understanding of how it works. I will fully cover installation and configuration of SQL Server AlwaysOn environment (using new and shiny SQL Server 2017) in my next article while focusing on technology overview here.
Normally I put a short conclusion/recap at the end of my articles, but to cater for the needs of modern reading audience with short attention span and purpose of this article I will give short and basic definitions in the very beginning so that you can get them first and if you want to go into more details you can just read on and learn more details.
First two terms to clarify here is HA and DR – two things SQL AlwaysOn is all about.
HA is ability of an application or service to run without downtime for prolonged periods of time. This time is normally expressed in nines which translate into amount of acceptable downtime. Beyond of available downtime windows HA uses different technologies to ensure that service interruption not happens, and our application runs without downtime. Essentially it is all about providing maximum uptime. In a layman terms you can say that HA is an ability to hide outages from end users.
DR, in turn, intended to address major disasters/problems which make it impossible to run our application without interruption (e.g. loss of an entire server or datacenter). It is more about recovery time (i.e. how quickly our application is up and running on another server or in another datacenter). In a layman terms it is an ability to recover from outages quickly.
HA and DR almost always go together and there is even a unified acronym “HADR” which refers to all HA and DR technologies and approaches, sometimes people even using and pronouncing it as word “hadr”. HA ensures that users see no outages, and DR ensures that if there is an outage, we can recover from it quickly (there is some downtime, but we are trying to minimize it). From these definitions you can see that HA and DR have different yet closely related goals.
When it comes to SQL Server there is two most prominent HADR technologies we used to have before Always On release – SQL Server FC instances and database mirroring. Though extended list of SQL server technologies which can be used in for HADR in one way or another may look as follows: log shipping, replication, mirroring, AlwaysOn AG, clustering.
We have had SQL Server FC as an HA solution for along time. FC itself is a Windows Server technology which predates SQL Server (available since Windows NT 4.0) and it is supported in SQL Server since version 6.5, though in reality SQL Server FC instances or FCIs become widely used at around the time of SQL Server 2000. With SQL Server FCI we have a SQL Server instance level protection / operating system level protection with shared identity and ability to move our SQL instance to another node in case of primary node failure. With FCI we not protecting databases, we protect the server and safeguarding our SQL Server instance against server failure. If server A fails shared resources (disk, SQL instance with databases) move over to instance B. FCI is not a load balancing (LB) solution for SQL as Active/Active mode for the same instance is not supported. Important point about SQL FCI is need for shared storage which adds extra layer of complexity and potential single point of failure.
In SQL Server 2005, Microsoft introduced SQL Server database mirroring to offer database level protection which can offer both HA and DR (depending on synchronization mode you configure). Cool thing about it is that we can have zero latency and zero data loss (databases are completely in sync). Conceptually you may think about mirroring as a continuous log shipping, as is continuously takes changes from your database log and ships them to a partner server continuously whereas log shipping doing this on schedule (underlying mechanisms for these two technologies are different though). Mirroring also has automatic failover to the partner. Yet there are major downsides/limitations: your failover target is unusable (meaning you cannot use it for let’s say reporting or other read-only operations) and you can only have one failover partner, it also works only on one database level (mirroring for each database must be configured separately).
Always On is a feature which was first introduced in SQL Server 2012, known as code-name Denali in its early pre-RTM days (initial feature set was first made publicly available in CTP1 of SQL 2012). With release of SQL AlwaysOn Microsoft started to talk about deprecation of database mirroring feature, yet with huge enterprise client base phasing out of products and technologies is not an easy thing to do, so up to now SQL Server database mirroring is listed as a feature not supported in the next version of SQL Server both for SQL Server 2016 and it is still listed as such for SQL Server 2017 😊
AlwaysOn offers both HA and DR options for SQL server and it is an umbrella term of SQL FCI HA and improved mirroring feature set in the form of SQL Server AlwaysOn Availability Groups (AG). You can also consider it as a replacement for database mirroring. Always On uses Windows Server FC as a foundation for enabling Availability Group for database (or databases) level availability and for creating a failover cluster instance for instance level availability. Another notable distinction is ability to use direct attached storage (DAS) instead of shared storage (SAN/SMB).
Image 1. AlwaysOn and FCI comparison. Slide from Microsoft Virtual Academy course “Data Series: Platform: SQL Server”
Bottom line here is that unless you inherited legacy SQL server mirrors in your environment(s) to take care of you should always plan your SQL HADR design with AlwaysOn feature set in mind.
Like database mirroring AlwaysOn can also provide zero latency and downtime. But I guess you more interested in what is new in AlwaysOn if we compare it with database mirroring? Basically, it addressed a lot of critical limitations we had in database mirroring and now we have:
- Multi-database failover (i.e. we can define group of databases as a single unit of failover and synchronization), with mirroring you had to configure mirroring on each database separately
- Multiple secondaries (maximum of 4 in initial release), whereas with mirroring you were limited by only one secondary/replication partner per database
- Active secondary (meaning we can now do true load balancing/have Active/Active cluster) or readable mirror, whereas with mirroring standby replica was unusable – it was just sitting and waiting to takeover during failover scenario
- Virtual name shared amongst all your replicas (virtual listener) – ho client-side connection strings adjustments necessary to ensure successful reconnect after failover
With this feature set of SQL Always On there are three main design patterns for SQL HADR:
- Using multi-site Failover Cluster Instance (FCI) for local high availability and disaster recovery solution
- Using Availability Groups (AG) for local high availability and disaster recovery solutionSQL Server
- Using Failover Cluster Instance (FCI) for local high availability, and Availability Groups (AG) for disaster recovery solution
To better highlight benefits of SQL Always On you can consider the following Always On use-case/SQL Server HADR design example:
You can have set of databases managed as a single AlwaysOn AG with single listener and fully automated failover. This AG is in turn can be synced across four AlwaysOn nodes: two nodes in the same data center configured as failover partners for HA, third one as a read only node for analytics/reporting and fourth node hosted in separate data center for DR.
You can see how this layout covers HADR requirements with an added benefit of performing your backups from secondary node so that our backups do not have impact on performance in production.
I hope that after we went through terms and features you have an idea of what is SQL Server AlwaysOn and how you can use it. I now want to give a little bit more extra details about SQL Server AlwaysOn and finish off this introductory article with brief overview how its feature set evolved from SQL Server 2012 to SQL Server 2017.
As you can see already SQL Server AlwaysOn is HADR solution for SQL Server which simplifies management and utilization of hardware resources involved into SQL Server HADR. With Always On you can configure HADR for application database(s) and/or entire instance.
From the technology standpoint AlwaysOn leverages things we saw before both in SQL FCI and database mirroring. SQL FCI is only HA solution which supports automatic failover and guarantees instance/OS level redundancy. It is not truly loaded balancing solution for SQL as SQL FC does not support Active/Active mode for one and the same SQL Server instance. Though with SQL FC you can do something close to Active/Active mode with two SQL Server instances within one cluster when Node A has Active instance A and Node B has active instance B, but it is not true LB for a single SQL Server instance (which at any given moment runs on only one active instance), it is Active/Active only in a sense that two nodes of SQL FC run two different active SQL Server instances and act as a failover partners for each other. SQL Server database mirroring is something that gives you DR and HA/LB depending on the way you configure it. Roughly speaking AlwaysOn mixes these two familiar technologies adding database availability groups so that final mix allows you to meet both HA and DR goals more efficiently.
To provide availability for database(s) with AlwaysOn we have AlwaysOn Availability Group. This core subcomponent name causes a bit of confusion and people sometimes using unnecessarily redundant terms, such as “AG Group” (i.e. Availability Group Group) for example 😊
To provide availability for entire instance we have AlwaysOn Failover Cluster Instances. Here is where it all gets a little bit confusing. To make it straight we have Windows Server Cluster which may refer both to NLB and FC, so more precise term is FC, next we can have SQL Server Cluster which is SQL Server installed on top of Windows FC (aka SQL FCI), and next we have AlwaysOn Failover Cluster Instance which basically refers to Windows FC used as a foundation for Always On AG functionality. Up to SQL Server 2016 Windows FC service was required for enabling AlwaysOn AG functionality and it was used mainly for health reporting and witnessing for AlwaysOn AG nodes.
Sound a little bit confusing, right? But as soon as you familiarize yourself with new things and design patterns possible with AlwaysOn and go through set up of your own test lab it all should become clearer (especially with that SQL FCI versus AlwaysOn FCI thing, which in practice just looks as a slight difference in set up process). It is almost the case when not knowing technologies we had before almost helps 😊 Another problem here is that a lot of documentation still intertwined and uses both old (pre-Always On) and new terms.
As I already mentioned, database mirroring is a “deprecated” technology which is being replaced by AlwaysOn AG. If you worked with database mirroring you will find familiar concepts in AlwaysOn AG if didn’t work with mirroring that’s not a big problem as with Always On you can do more of the same and it is easier to do it 😊.
While FC is protecting SQL instance with use of shared storage, database mirroring offers database level protection by means of syncing (mirroring) them to other instances (mirrors) and it works a bit better for geo-stretched designs when moving shared storage during failover is not always a viable approach while continuous synching works. In a way it is similar to logs shipping. There are two modes of mirroring – high-performance with risk of possibility to have lost transactions (asynchronous) or safe mode with higher latency (synchronous). Synchronous mirroring can be DR solution while asynchronous one can be HA solution (when used with witness). As I mentioned above database mirroring had certain limitations most critical of them being that it does not have shared identity similar to cluster name in FC and inability to have more than one secondary. All these limitations have been addressed in AlwaysOn.
Let’s recap core terms/components we were discussing in a shortest form possible:
AlwaysOn – it is a term which refers to bundle of technologies to provide SQL HADR
AG – SQL Server technology which protects one or more DB on a SQL Server instance as a logical group, requires AlwaysOn
Node – Windows server participating in AG Group
Instance (SQL) – Windows Server with SQL installation with DB belonging to AG
Replica – DB protected by AG
Primary Replica – R/W DB participating in AG (we can have a more than one R/W)
Secondary Replica – secondary DBs which can run synchronously, asynchronously, RO and/or as a failover partner.
Always On AG includes everything listed above to provide complete SQL HADR solution.
To finalize my Always On overview below you can see brief overview of how its feature set evolved from its initial release in SQL Server 2012 up to now.
SQL Server 2012 was initial release of Always On which supported maximum of four secondary replicas. Initial set of features looked as follows:
- Multi database failover
- Multiple secondaries (4 maximum)
- Synchronous and asynchronous data movement
- Support for 2 synchronous secondaries for additional data protection
- Built in compression and encryption of transport
- Automatic, Manual and Forced Failover
- Flexible failover policy
- Automatic Page Repair
- Active Secondary
- Readable secondary
- Secondary backup
- Automatic application redirection using virtual name
- Configuration Wizard for simplified deployment
- Always On Dashboard
- System Center Integration
- Automation using PowerShell
Next, SQL Server 2014 added the following:
- Support for a secondary in Azure
- Increased maximum number of secondary replicas from four to eight
- Increased availability of readable secondaries in multi-site environments
Later, with SQL Server 2016, we got the following improvements:
- Round-robin load balancing in readable secondaries
- Increased number of auto-failover targets from 2 to 3
- Enhanced log replication throughput and redo speed
- Support for group-managed service accounts
- Support for Distributed Transactions (DTC) which applies to distributed transactions between databases hosted by two different SQL Server instances or between SQL Server and another DTC-compliant server.
- Basic HA in Standard edition
- Direct seeding of new database replicas
One of the most anticipated feature from this list was support for HA in Standard edition, though what we get in Standard Edition of SQL Server 2016 is “Basic Availability Groups”, which a limited so that they give you the same capabilities as you had with database mirroring. Basic availability groups provide failover support for single database and allow you to have single replica for primary database using either synchronous or asynchronous commit mode. It was done just because without this a lot of people weren’t able to accept SQL AlwaysOn as a full replacement for database mirroring which was available both in Standard and Enterprise editions.
Installation without domain was made possible with release of Windows Server 2016 where Windows Server Failover Clusters are no longer require that all nodes in a cluster reside in the same domain. You can read up about features added in SQL Server 2016 AlwaysOn in this MSFT SQL Server blog post.
And finally, SQL Server 2017 introduced the following new Availability Groups functionality:
- Clusterless support (i.e. you can now enable AlwaysOn Availability Groups without having FC installed on Windows Server or without Pacemaker on Linux). I think it is worth clarifying as to what exactly is possible without installation of Windows FC (note here we speak about installing WFC components and not about configuring SQL FCI). Without FC you can only have read-scale availability group. I guess I will quote official Microsoft documentation here: SQL Server 2017 introduces two different architectures for availability groups. AlwaysOn availability groups provide high availability, disaster recovery, and read-scale balancing. These availability groups require a cluster manager. In Windows, failover clustering provides the cluster manager. In Linux, you can use Pacemaker. The other architecture is a read-scale availability group. A read scale availability group provides replicas for read-only workloads but not high availability. In a read-scale availability group there is no cluster manager.
It means despite you can now enable that crucial checkbox depicted below on a server without WFC components installed it does not mean that WFC dependency is fully removed and you no longer need that foundational building block for complete AlwaysOn functionality.
Picture 2. AlwaysOn can now be enabled without WFC, yet it will only work for read-scale availability group.
- Minimum Replica Commit Availability Groups setting (enables users to set the minimum number of replicas that are required to commit a transaction before committing on the primary)
- Windows-Linux cross-OS migrations and testing (i.e. you can now have primary replica on Windows Server and secondary on Linux).
- Cross-database transactions are now supported among all databases that are part of an AlwaysOn Availability Group, including databases that are part of same instance.
This concludes overview of AlwaysOn feature set evolution. We can clearly see that by now SQLAlways On is a mature feature which positioned by Microsoft as technology of choice for mission-critical HADR for SQL data platform and meets the requirements of being reliable, integrated, flexible and efficient. Indeed, after looking at feature possibilities we discussed SQL AlwaysOn really offers unified approach to SQL HADR which covers different areas, and it saves you from the need of stitching HADR solution from different technologies acquired from different vendors – removing complexity in support and management.
Picture 3. SQL AlwaysOn as a unified HA solution, slide from Microsoft Virtual Academy course “Data Series: Platform: SQL Server”
I hope this article was sufficient for initial technology overview and in the next one I will be covering installation and configuration of SQL Server AlwaysOn AG step by step using SQL Server 2017. In case you want to read up more on the specifics of this technology you may refer to the official Microsoft documentation for Availability Groups, and I should say that Microsoft documentation really becomes better and better nowadays. In case you have any questions or feedback about this article just let me know in the comments below.