Deploying SQL 2017 AlwaysOn Availability Group Clustering on AWS EC2 Instance running Windows Server 2016

The purpose of this post is to walk you through how to setup SQL Server AlwaysOn Availability Group clustering environment running on EC2 instances in AWS cloud. This is completely based on the lab setup, while the steps are going to be similar in the case of production environment setup, however the resource requirements (specifically capacity) may vary based on the environments and use cases. Although the resources can be scaled-up/scaled-out on the fly in AWS cloud based on the demand but it is always recommended to benchmark the performance and network requirements along with capacity planning for storage.

High Level Environment Details and Key points:
1. This is two Node SQL AlwaysOn Availability Group (AG) Cluster with a File Share Witness (For Cluster Quorum), Each node is deployed in separate Availability Zone (Subnet) in a VPC.



2. File Share Witness is deployed in different Availability Zone than SQL Cluster nodes to sustain AZ level failure.

3. Each node has single Elastic Network Interface (ENI) with 3 Private IP addresses.
(A). First Primary IP address will be configured statically on the NIC at OS level for each node and it will be cluster node’s primary IP Address.
(B). Second IP address will be used as Cluster IP Address which will be provided during WSFC configuration wizard for both nodes.
(C). Third IP address will be used for SQL AG Listener communication.

4. Since this SQL Cluster is setup in the lab for learning purpose, I’m keeping all the data (SQL DB, SQL Backup and Logs) in root volume (C:). I’m not using dedicated disk storage for SQL purpose. If you are implementing this in production environment it is always a best practice to have dedicated disk storage for SQL Databases, SQL Backup and SQL Logs for better performance.

5. As you might be aware of, by default we can’t have Shared Storage (called SAN Storage) in the AWS (and I believe this is true for other Public Cloud as well) unless you use some kind of software-based Virtual SAN solution deployed in Cloud. So to implement SQL Clustering in cloud the possible scenario is “SQL AlwaysOn Availability Groups (SQL AG)” which doesn’t require shared storage to function the cluster. Mainly below are the two types of SQL Clustering scenarios, please refer Microsoft documentations for more detailed information.

(A). SQL Server AlwaysOn Availability Groups (SQL AG) – Doesn’t require Shared Storage.
https://docs.microsoft.com/en-us/sql/database-engine/availability-groups/windows/overview-of-always-on-availability-groups-sql-server?view=sql-server-2017

(B). SQL Server AlwaysOn Failover Cluster Instance (FCI) – Does require Shared Storage.
https://docs.microsoft.com/en-us/sql/sql-server/failover-clusters/windows/always-on-failover-cluster-instances-sql-server?view=sql-server-201

6. Cluster nodes and Witness are joined to Active Directory domain (CloudMegh.local) running inside an AWS EC2 instance, if you have an ADDS running on-premises then you would need to have a VPN tunnel from on-prem to AWS VPC to establish the ADDS communication.

Step by Step Configuration:
Now we have discussed about high level of environment details above, let’s go ahead and start preparing and building our Cluster Nodes and File Share Witness (FSW).
1. Configure the IP address of the Cluster nodes and File Server Witness, I have configured IP address for this lab cluster environment like below.
——————————————————————————————————————–SQL-NODE1.CloudMegh.local
172.31.37.186- Primary IP (Statically assigned at OS level on the EC2 instance)
172.31.37.34 – Cluster IP (Static IP for Cluster CNO)
172.31.37.49 – SQL AG Listener IP
SQL-NODE2.CloudMegh.local
172.31.31.138 – Primary IP (Statically assigned at OS level on the EC2 instance)
172.31.31.139 – Cluster IP (Static IP for Cluster CNO)
172.31.31.140 – SQL AG Listener IP
FSW.CloudMegh.local
172.31.7.78
——————————————————————————————————————–
2. Enable “Windows Server Failover Clustering” feature on both of the SQL nodes, SQL AlwaysOn uses WSFC feature for its high availability. You can user “Server Manager” GUI or PowerShell to install this feature. We will need to reboot the nodes after installing WSFC feature.
Install-WindowsFeature –Name Failover-Clustering –IncludeManagementTools



3. Once both nodes come back online after reboot, verify if WSFC feature was installed correctly.



4. Now I will run Cluster Validation test to see if everything is fine before we move ahead to setup our Failover Cluster. We can run Cluster Validation using PowerShell or Failover Cluster Manager GUI using wizard. Below PowerShell cmdlet will run all the validation test on both Cluster Nodes.
Test-Cluster -Node “SQL-NODE1.cloudmegh.local”, “SQL-NODE2.cloudmegh.local”



Note: Cluster Validation Test report will be available under below location, once the test is completed we can review this to see if our nodes are ready to build the cluster.
Cluster Logs and Validation Report Location: C:\Windows\Cluster\Reports

If I look at my Cluster Validation report I see some warnings for Network test so let’s go ahead and verify what it is complaining about.



It is saying that I have single point of failure because both cluster nodes have single ENI attached to them. We can ignore this at least when building WSFC in AWS.



5. Now my nodes are ready to start building the cluster, again you can use PowerShell or GUI. I’m running below command to create my new WSFC Cluster.
New-Cluster -Name SQLAG-CLUSTER-Node SQL-NODE1.CloudMegh.local,SQL-NODE2.CloudMegh.local -StaticAddress 172.31.37.34,172.31.31.139





Cluster was created successfully but there were few “warnings” so let’s review the “Create Cluster Wizard” report to see what are those warning.


So the warning says that I don’t have shared storage for this cluster which is correct, I really don’t have shared storage here as I mentioned during initial discussion about this. So I’m ignoring this warning as well.




6. Verify from Active Directory, your Cluster Virtual computer object should be listing like this.



7. Verify from Failover Cluster Manager console and see if everything looks as expected.




8. Now I will go ahead and configure Quorum for my cluster. I will be using “File Share Witness” for my Cluster Quorum instead of “Disk Witness” again because if unavailability of shared storage disk. I’m using my EC2 instance “FSW.CloudMegh.local” for hosting my FSW. I created folder “WitnessShare” and shared this directory and provided my Cluster Virtual object “SQLAG-CLUSTER” Modify permission on this folder.





Understanding Windows Failover Cluster Quorum:
https://docs.microsoft.com/en-us/windows-server/storage/storage-spaces/understand-quorum
https://docs.microsoft.com/en-us/windows-server/failover-clustering/manage-cluster-quorum

9. Now we have our SMB share “WitnessShare” ready to added to our cluster, I’m adding it to cluster as “File Share Witness” using Failover Cluster Manager console. If you are building Windows Cluster with even number of nodes then you must have witness configured to contribute in the voting to form cluster quorum majority.

Right Click on “SQLAG-CLUSTER” —–>More Actions—->Configure Cluster Quorum Settings



Select “Advanced quorum configuration” and Click on “Next”


By default “All Nodes” will be selected—->Click on “Next” to proceed further on the wizard.



Select “Configure a file share witness” —->Click on “Next”


Provide share path for File Share Witness directory on “FSW.CloudMegh.local” instance.


Click “Next” to proceed further.


Now we have our Cluster Quorum configured successfully for cluster.


Note: We have completed Windows Failover Cluster specific core configuration till this step, Now we need to move ahead with SQL specific configurations in further steps, please keep following the steps accordingly.

10. I created below AD User account “SQLSVC” and has given local administrator rights on both of the SQL nodes that are part of this cluster. SQL Services will be running under this AD user context and I might use this same account for other purpose specific to SQL but you may need to consider creating another AD user account for different purpose in SQL considering your environment security compliance and requirements.



11. Since both SQL server nodes are launched from Amazon provided AMI with SQL preinstalled, so the SQL services are running under default “NT Service” account and we need to change this to our AD User account “SQLSVC”. Please review the Microsoft documentation regarding more details about changing SQL Service startup account.
https://docs.microsoft.com/en-us/sql/database-engine/configure-windows/scm-services-change-the-service-startup-account?view=sql-server-2017



Open “SQL Server Configuration Manager”



Navigate to “SQL Server Services”

Change “Log On” account to “SQLSVC” for SQL Server service and SQL Server Agent service. This has be done on both of the SQL Server nodes manually.

12. Enable AlwaysOn Availability Group on both of the SQL Server nodes.
 Right Click on “SQL Server” Service in SQL Server Configuration Manager console and then go “AlwaysOn High Availability” tab.

Click on check box “Enable AlwaysOn Availability Groups” and restart the SQL Services on the node. This needs to be done on both of the SQL server nodes.


13. Create SQL databases that will be part of this SQL Always On Availability Group. Open SQL Server Management Studio (SSMS).


14. Create required SQL Databases. In this lab I’m creating two SQL databases (App-DB1 and App-DB2) that will be part of my SQL Always On Availability Group.




I have got my two SQL databases created, These are just empty databases that I have created for this lab demo and they don’t have any schema.



15. Manually create Virtual Computer Object (VCO) for SQL AG Listener in Active Directory. This is called pre-staging the object in Active Directory.







VCO for SQL AG Listener is created, now we need to grant full permission to our Cluster Named Object (CNO) “SQLAG-CLUSTER$” on this VCO (SQLAG-LISTEN).



16. Take the Full Backup of all the databases that you will be adding to AG. This is one of the prerequisites for AG configuration. If you don’t perform full backup of the DBs then it will not allow to create/configure AG.



17. Configure  SQL Server 2017 Always On Availability Group

Open SQL Server Management Studio, connect to the SQL Server.



In SSMS Console, navigate to “Always On High Availability” and expand it. Then Right click on “Availability Groups” and Click on “New Availability Group Wizard” to continue.



AG creation wizard will start now. Click on the “Next” button to proceed further.



Specify AG name and leave Cluster type set as default to WSFC and click on “Next”.



Select all the SQL DBs that you would like to be part of this SQL AlwaysOn AG.


Select both SQL Server nodes for Automatic Failover with Availability Mode as “Synchronous commit”.



Under Backup Preferences tab select as per your requirement, I’m leaving it to default “Prefer Secondary”.


Under “Listener” I’m providing my SQL AG Listener VCO pre-staged in Active Directory and specifying the port 20000, you can specify any port here other than default SQL Port 1433. You also need to specify the listener IP addresses here.



Provide share location for backing up SQL DBs and logs that will be initially replicated from the backup.



Wait until validation checks get completed successfully.


Review the summary and click on “Finish” button to start creating SQL AG Lister.




It has been created successfully now.



Verify SQL AG Listener configuration from SSMS console.



You can review overall configuration and Health Status of SQL AG Listener “Show Dashboard” in SSMS console, it will look something like below.


You can also connect the other SQL Node (Secondary) from SSMS console to verify configuration.





If I open “Failover Cluster Manager (FCM)”, I have my SQL Availability Group “App-DB-AG” configured as Role in the cluster.




18. How to Test failover of SQL Always On Availability Group:
Since the cluster is ready and operational now, we can go ahead and perform failover of the nodes and see if its working as expected.

Perform a planned manual failover of an Always On availability group (SQL Server):
https://docs.microsoft.com/en-us/sql/database-engine/availability-groups/windows/perform-a-planned-manual-failover-of-an-availability-group-sql-server?view=sql-server-2017

(A). Manual Failover from SSMS Console:

At the moment “SQL-NODE1” is the Primary replica in the AG and SQL-NODE2 is the Secondary one. Let’s go ahead and perform the failover of the nodes from SSMS Console.





Select the available secondary replica server which will be acting as Primary after failover of the resources.





Review the summary and proceed further.


Failover is completed successfully.



Now if you verify from SSMS console, “SQL-NODE1” is acting as secondary replica in AG.



In the FCM, owner node changed to “SQL-NODE2” that is holding the cluster role right now.




(B). Manual Failover from FCM Console:
We performed above failover steps from SSMS console, we can also perform same failover from command from FCM as well. Follow the below steps.

Right Click on the SQL AG role in FCM —>Move–>Select Node.



Choose “SQL-NODE1” and Click OK. The Failover will be done in couple of seconds. Now the SQL-NODE1 again became Primary Replica.



(C). Automatic Failover Testing for SQL Always On Availability Group:

In the above steps, We learned failing over cluster roles manually from FCM and SSMS console. Now I will go ahead and demo automatic failover of the cluster role between the nodes.

If you see currently “SQL-NODE1” is acting as Primary Replica in SQL AG cluster, we will go ahead and drain this primary replica node to failover the role to another node.





Go to FCM Console—–>Nodes—–>Select “SQL-NODE1”—>Right Click—–>Pause—–>Drain Roles.

Note: “Drain Roles” basically moves the cluster role off the current node and also puts the node in paused state. This feature was introduced with Windows Server 2012 version. This is helpful in case of planned failover of the nodes or maintenance/patching of the cluster nodes.



“SQL-NODE1” is in paused state now. One thing to note here is that even if the cluster node is in PA– USED state it still does contribute to Cluster Quorum voting/formation.



Since the node “SQL-NODE1” is in paused state, the role was moved to “SQL-NODE2” automatically.



You can also verify the same from SSMS console as well.

Note: Please note that you would need to open required ports in your Security Group/NACLs for between SQL server instance to allow communication. I did’t specifically mentioned in this document because I had allowed all the ports for my VPC’s CIDR in Security Group so I didn’t have to worry about that.

So this was all about setting up SQL 2017 AlwaysOn Availability Group cluster on AWS EC2 instance. I would advise to visit and read below articles/documentations regarding this topic. I hope this will help you setting up your own lab to clarify some fundamentals around this.

Below are some important blogs and documentations related to Windows Failover Clustering:

Behavior of Dynamic Witness on Windows Server 2012 R2 Failover Clustering:
https://blogs.technet.microsoft.com/askcore/2016/03/21/behavior-of-dynamic-witness-on-windows-server-2012-r2-failover-clustering-3/

Failover Cluster Node Startup Order in Windows Server 2012 R2:
https://blogs.technet.microsoft.com/askcore/2016/01/21/failover-cluster-node-startup-order-in-windows-server-2012-r2/

Create Listener Fails with Message ‘The WSFC cluster could not bring the Network Name resource online’
https://blogs.msdn.microsoft.com/alwaysonpro/2014/03/25/create-listener-fails-with-message-the-wsfc-cluster-could-not-bring-the-network-name-resource-online/

Clustering For Mere Mortals:
https://clusteringformeremortals.com/

A deep-dive into Cluster OS Rolling Upgrades in Windows Server 2016:
https://cloudblogs.microsoft.com/windowsserver/2016/10/27/a-deep-dive-into-cluster-os-rolling-upgrades-in-windows-server-2016/

SQL Server Multi-Subnet Clustering (SQL Server):
http:// https://docs.microsoft.com/en-us/sql/sql-server/failover-clusters/windows/sql-server-multi-subnet-clustering-sql-server?view=sql-server-2017

Building True Multi-Subnet Windows Server Failover Clusters:
https://blogs.msdn.microsoft.com/sambetts/2014/05/16/building-true-multi-subnet-windows-server-failover-clusters/

1 thought on “Deploying SQL 2017 AlwaysOn Availability Group Clustering on AWS EC2 Instance running Windows Server 2016”

Leave a Reply

Your email address will not be published. Required fields are marked *