Skip to main content

SQL 2012 AlwaysOn - Recovery from a Disaster

We have configured 3 SQL server in an AlwaysOn Availability group (see here)
Two servers (SQLDEV1 and SQLDEV2) are configured for automatic failover.

To test how long it takes for SQL Server to fail over from SQLDEV1 to SQLDEV2 we connect to the availability group listener name (sqldevcluster1)


and on the "Additional Connection Parameters" tab we need to add

Multisubnetfailover=TRUE




We can check the current status of the servers in the availability group with this query


select rs.role_desc, rs.connected_state_desc, rs.synchronization_health_desc, cs.replica_server_name
from sys.dm_hadr_availability_replica_states AS rs
JOIN sys.dm_hadr_availability_replica_cluster_states as CS ON rs.replica_id=cs.replica_id



and it returns

If we connect to SQLDEV2 and initiate manual failover


ALTER AVAILABILITY GROUP AG1 FAILOVER;



this statement takes 17 seconds to complete.
And if we check the status of the availability group from the session connected to sqldevcluster1 we get the transport-level error (because the original connection was closed). This error message was returned in about 10 seconds.


Msg 121, Level 20, State 0, Line 0
A transport-level error has occurred when receiving results from the server. (provider: TCP Provider, error: 0 - The semaphore timeout period has expired.)

If we check the status of the availability group again the query returns the results instantaneously

and we see that the primary role is owned by SQLDEV2.

Now if we stop the SQL Server service on SQLDEV2 then in about 15 seconds we'll see roles swapped as


 and if we re-run the query in 5 seconds the connection with SQLDEV2 is established

and in a couple of seconds the synchronization gets to the healthy status.

If we reboot the SQLDEV1 server now the failover process is exactly the same. in about 20 seconds SQLDEV2 takes over the primary role  and then automatically synchronizes itself with SQLDEV1 when it comes back online.

Recovery from a Disaster


The Disaster Recovery process  is different because it needs some additional actions and manual steps.

When the primary data center is inaccessible the windows cluster service on SQLDEV3 cannot start because there. is no quorum. So, our first step is to force the cluster service to start without quorum.
To do this we need to execute the following command from the elevated command prompt

net start clussvc /forcequorum

After forcing the cluster service to start we will see on SQLDEV3 in SSMS


Now we need to initiate manual failover with possible data lost

ALTER AVAILABILITY GROUP AG1 FORCE_FAILOVER_ALLOW_DATA_LOSS;


And this will make SQLDEV3 our primary as show below



Resuming Normal Operations


When the Primary Data Center comes back online and both SLDEV1 and SQLDEV2 servers are operational we have to resume the replication within the availability group.

We have to connecte to each of SQLDEV1 and SQLDEV2 and resume the  replication by executing the following statements

ALTER DATABASE TESTDB1 SET HADR RESUME;

ALTER DATABASE TESTDB2 SET HADR RESUME;



To transfer the primary role back to SQLDEV1 we need temporarily to change the replication mode to synchronous commit on all three servers.
We do this from SQLDEV3 by executing the following statements:

USE master
Go

ALTER AVAILABILITY GROUP AG1 MODIFY REPLICA ON N'SQLDEV1' WITH (AVAILABILITY_MODE=SYNCHRONOUS_COMMIT);
GO
ALTER AVAILABILITY GROUP AG1 MODIFY REPLICA ON N'SQLDEV2' WITH (AVAILABILITY_MODE=SYNCHRONOUS_COMMIT);
GO

ALTER AVAILABILITY GROUP AG1 MODIFY REPLICA ON N'SQLDEV3' WITH (AVAILABILITY_MODE=SYNCHRONOUS_COMMIT);
GO


Then when connected to SQLDEV1 we do automatic failover

ALTER AVAILABILITY GROUP AG1 FAILOVER


And after successful role transfer to SQLDEV1 we change the replication mode on SQLDEV3 back to asynchronous commit

ALTER AVAILABILITY GROUP AG1 MODIFY REPLICA ON N'SQLDEV3' WITH (AVAILABILITY_MODE=ASYNCHRONOUS_COMMIT);
GO


And the final step is to transfer the core Cluster Resource back to a server in the Primary Data Center.
In Failover Cluster Manager



we see that the active virtual IP is 192.168.69.134 which is an IP in the Secondary Data Center.

To move the core cluster resources back to SQLDEV1

Right-click on the cluster name and select


More Actions à Move Core Cluster Resources à Select Node

When completed (in ~10 seconds) the status will show that current Host server is SQLDEV1





Comments

Popular posts from this blog

Joining Windows 10 to Azure AD Domain

As of October 2016 to join Windows 10 computers to Azure AD Domain service requires these steps: Create a VNET in the classic portal . The VNET must be placed to a region where Azure AD domain service is available (( https://azure.microsoft.com/en-us/regions/services/ )  In the classic portal  go to Directory -> Configure and enable the domain service. And wait for ~ 30 min When completed the IP address will be populated Go back to the VNET configuration and add a DNS server with the IP (10.0.0.4 in this case) Create the "AAD DC Administrator" administrators group (again in Directory -> Group). Members of this group are granted administrative privileges on machines that are domain-joined to the Azure AD Domain Services managed domain. Add to the group your users who are supposed to have the administrative access on a Windows 10 computer go to Settings -> Accounts (this is true for Windows 10 version  1607) then select 'Access work

Create 3-Node Windows 2012 Multi-subnet Cluster

Environment There are two Data centers connected via a WAN link. Two Windows 2012 Servers (called SQLDEV1 and SQLDEV2) are located in the Primary Data Center (on the IP subnet 192.168.79.0/24) and the third server is placed in the Secondary Data Center with the 192.168.69.0/24 subnet. We’ll be creating a three-node Windows cluster with no shared storage on the multi subnet network with a file share witness at the Primary Data Center. We’ll be using a file share witness to protect from the cluster failure in a situation when the network between the Data Centers is unavailable and one of the servers in the Primary Data Center is also down (or being rebooted). The final state will look like depicted above: -           Two Virtual IP’s will be assigned (192.168.76.218 and 192.168.69.134) to the cluster -           The servers at the Primary Data Center will have a vote (Vote=1) and the server at the Secondary Data Center will have no vote (Vote=0). The file share witness al

Generate Calendar Table in Power BI with Fiscal Year Attributes

In Power BI go to Get Data --> Blank Query and paste into the Function windows the text below. This function takes as parameters: - StartYear (e.g., 2012) - EndYear (e.g., 2018) -FiscalYearStartMonth (e.g., 4) And it will generate a calendar table for dates from Jan-1-<StartYear> till Dec-31-<EndYear> and columns for Fiscal Year, Fiscal Month, Fiscal Quarter, Fiscal Year Day where the Fiscal year begins on FiscalYearStartMonth = (StartYear as number, EndYear as number, FiscalYearStartMonth as number)=> let     //Capture the date range from the parameters     StartDate = #date(StartYear, 1, 1),     EndDate = #date(EndYear, 12, 31), //Get the number of dates that will be required for the table     GetDateCount = Duration.Days(EndDate - StartDate)+1, //Take the count of dates and turn it into a list of dates     GetDateList = List.Dates(StartDate, GetDateCount,     #duration(1,0,0,0)), //Convert the list into a table     DateListToTable