They are multiple options to implement HADR solution for SQL Server in AWS public cloud. The easiest way to do that is to use AWS Database as a Service (DBaaS) product known as AWS RDS and enable Multi-AZ option using your AWS Control Panel (Fig #1).
That is all it takes to roll out or add it to an existing instance. AWS will take care of the rest under the hood (depending on your SQL Server version, it might use Mirroring or Always On Availability Groups). It will provision all the necessary things for an automatic failover (witness, network, storage, etc…), so when a primary node will go down, it will be replaced with a secondary node. No ifs and no buts. This blog post is going to discuss how to build your own HADR solution using multiple AWS EC2s (BYOC) instead of a managed AWS RDS service.
BYOC option is going to involve more work, but you can keep the control save some money. That option is going to involve building your own HADR solution utilizing AWS EC2. To be more specific, we would need multiple zones, multiple virtual servers (EC2), SMB. We would need to install and configure everything.
The main question is why bother, right? Building SQL Server clusters is not for everyone. If AWS RDS does all the provisioning and your HADR implementation is just few clicks away, why bother with building HADR using AWS EC2? Here are few reasons against AWS RDS:
- AWS RDS is generally more expensive (Fig #2), so “the few clicks away” HADR solution is going to cost you extra per hour, all the time
- Licensing – AWS RDS doesn’t support bring your own license (a.k.a. BYOL), so you can’t use your existing license
- Less than optimal Service Level Agreement (a.k.a. SLA) – AWS RDS is 99.95%
- Long Recovery Time Objective (a.k.a. RTO) – it takes 60-90 seconds for AWS RDS to recover/fail-over
- Loss of admin control – AWS RDS neither gives you access to the HADR implementation nor access to the file system
- Features – AWS RDS is not 100% compatible with SQL Server standalone version. For example, you can’t have CLR functions
- Implementation – AWS RDS for SQL Server is using Mirroring/Availability Group (depends on SQL Server version) in a synchronous-mode which guarantees data consistency at the expense of availability (your primary node is dependent on a second node). What that means is you would lose your primary node if AWS RDS can’t connect to the secondary node
Sounds like AWS RDS Multi-AZ might not be the most optimal solution for SQL Server. Additionally, can I pay less and get more? Absolutely, let’s discuss how to replace AWS RDS with AWS EC2. That being said, you need to be OK with one-time investment of building your own HADR solution.
Let’s start with the price. AWS EC2 is already less expensive on hourly rate than AWS RDS (Fig #2), but what about SQL Server licensing? The most budget friendly version of SQL Server is Standard Edition (a.k.a. SE). With SE on Windows, we can achieve HADR using Failover Cluster (FCI), Always On Availability Group or a combination of both. While starting from SQL Server 2017, SE comes with Always On Basic Availability Group (a.k.a.. AG) setting it up and using is really cumbersome (one group per one database is far from manageable), so if we don’t need scale out reads, we can just build FCI and call it for the day. Worth mentioning is that FCI on SE is limited to 2 nodes, but less nodes means less $$$ to spend, right?
Let’s discuss features. AWS EC2 will give us virtually 100% administration control on the server while FCI implementation will give us more SLA (99.99%), 100% SQL Server compatibility, RTO of seconds, and to top it off – FCI is a pure availability solution (loss of secondary node has no effect on a primary node).
One thing that we didn’t discuss yet is the FCI requirement for a shared storage – somehow the storage needs to be local but shared between nodes. While in your own private cloud / data center / garage, your storage solutions are only limited by money and time, your storage configuration options in a public cloud are limited. You can’t turn on your new SSD blade storage / SAN as easy as before, especially staying budget friendly. Luckily, all we need is a storage replication solution.
The following solution will guide you on how to implement BYOC SQL Server without leaving the AWS “bubble”.
We would need 2 same-size EC2 servers with some minimum EBS volumes for the overhead, one inexpensive EC2 as a witness to support an automatic failover, 2 EBS volumes for the data, SIOS DataKeeper Cluster Edition for storage replication (to provide a shared storage), and FCI built on top of Windows Server Failover Cluster (WSFC).
- Create 2 x EC2 instances (same OS and instance size) for the 2 cluster nodes
- Create 1 x EC2 (inexpensive t1.micro instance would do the job)
- Create and attach EBS volumes to both nodes
- Make sure that all 3 servers that you want to add as cluster nodes are joined to the same Active Directory domain.
(Optional) Create an organizational unit (OU) and move the computer accounts for the servers that you want to add as cluster nodes into the OU. As a best practice, we recommend that you place failover clusters in their own OU in AD DS. This can help you better control which Group Policy settings or security template settings affect the cluster nodes. By isolating clusters in their own OU, it also helps prevent against accidental deletion of cluster computer objects.
- Make sure that the account you want to use to create the cluster is a domain user who has administrator rights on all servers that you want to add as cluster nodes.
- Make sure that either of the following is true:
Create separate OU for the cluster. Grant Domain\Cluster account Full Control on this OU. Move cluster objects (computer accounts, cluster objects, Domain\Cluster user) to this OU.
Grant Domain\Cluster Create Computer Objects and Read All properties permissions on the default container with computer objects for the domain. Remove them once the objects are moved to the cluster OU. For more information, see Prestage Cluster Computer Objects in Active Directory Domain Services.
- Join all 3 servers to the same domain
- Add Domain\Cluster as a local admin to both servers
- Uninstall SQL Server 2017 Standard Edition from EC2 node1 and node2
- Add host file entries to both nodes
- Trust all computers and users for Kerberos delegation in AD
- Add SPNs
From this point you can follow excellent instructions here – Deploying DataKeeper Cluster Edition in AWS – SIOS DataKeeper Cluster Edition – 8.6.1 that will guide you how to:
Add Failover Clustering feature
Validate Cluster Configuration
Create Windows Server Failover Cluster (WSFC)
Install and Configure SIOS Data Keeper
Install SQL on the First SQL Instance
Important – when installing SQL Server Failover Cluster, make sure not to enable Analysis Services (Fig #3)!
Install SQL on the Second SQL Instance
Adjust TTL to less than a minute
HUGE thank you! The solution described in this blog post would not be possible without the following individuals (alphabetical order):
Allan Hirt, David Bermingham, Nick Rubtsov, Vadim Kulikov.