Skip to main content

How good Azure geo-replication is

Microsoft Azure offers the standard and active geo-replication capabilities for Azure SQL databases. 

This feature implements a mechanism to provide database redundancy within the same Microsoft Azure region or in different regions (geo-redundancy). Active Geo-Replication asynchronously replicates committed transactions from a database to up to four copies of the database on different servers. 
When Active Geo-Replication is configured a secondary database is created on the specified server. The original database becomes the primary database. The primary database asynchronously replicates committed transactions to each of the secondary databases. While at any given point, the secondary database might be slightly behind the primary database, the secondary data is guaranteed to always be transactionally consistent with changes committed to the primary database.
When you place the secondary database on a server in a different region you add maximum resilience to your application. The cross-region redundancy enables applications to recover from a permanent loss of an entire datacenter or parts of a datacenter caused by natural disasters, catastrophic human errors, or malicious acts. 
Let's put it to the test to see how far behind the primary database the asynchronous replicas will be.
We'll need to have readable secondary replicas to be able to compare the number of transactions committed on different servers.
The easiest way to compare business transactions is to create a Sequence object in the database and then in each INSERT/UPDATE use the NEXT VALUE FOR the Sequence and then compare the next available value for the sequence by querying the sys.sequences DMV.
For the test I create the sequence object dbo.Seq1
CREATE SEQUENCE dbo.Seq1 AS INT 
START WITH 1 INCREMENT BY 1 NO CACHE NO CYCLE

 and the main table 
CREATE TABLE dbo.T1(
ID INT NOT NULL PRIMARY KEY CLUSTERED,
NumOfUpdates INT DEFAULT(0),
LastSPID INT,
CurSeq INT NOT NULL,
DummyText VARCHAR(1024),
UpdateTime datetime2,
CreateTime datetime2 DEFAULT(SYSDATETIME()));

In the mail stored procedure I 
1. generate a random number between 1 and 100,000,000 which used as a key for table T1.
2. If the table already has a row with the generated key then this record will be updated with
    a. a new dummy text (with a length between 21 and 1024 characters) to cause page splits when updates the existing row
   b. UpdateTime equals to the current SYSDATETIME()
   c. CurSeq equals the NEXT VALUE FOR dbo.Seq1
   d. LastSPID with the SPID of the process updating the record
   e. NumOfUpdates increments by 1

3. If the table doesn't have a row with the generated key then a new row gets inserted with
    a. a dummy text (with a length between 21 and 1024 characters)
   b. CreateTime equals to the current SYSDATETIME()
   c. CurSeq equals the NEXT VALUE FOR dbo.Seq1
   d. LastSPID with the SPID of the process updating the record
   e. NumOfUpdates sets to 0

I created a primary database in the "East US" data center as V12 with the P2 level and ran 11 concurrent session calling the main stored procedure. All the clients were running from the single computer (my laptop) and I collected each second the next available sequence value
SELECT current_value from sys.sequences WITH (NOLOCK)

The average difference in the current value was 1,404 - so on average each second these 11 clients were able to make 1404 INSERTs/UPDATEs to the dbo.T1.
This is baseline for the geo-replication test.
Then I truncated all the tables in the primary database and configured 3 active replicas in the following regions:
1. West US
2. North Europe
3. Japan East
In the Azure portal the replication topology looks like this

And then I ran the same test with 11 concurrent clients again.
The CPU utilization on the test database as 100% as soon as the clients started


 CPU utilization on each replica was close to 0

The average number of transactions (measured every second as a difference in the next value for the sequence object) was 1,405. The same as in the baseline. 
Thus, adding 3 readable asynchronous replicas didn't hurt the primary database. This is good.
Now let's look at the transaction level measured at the beginning of each 5-second interval as a difference in the NEXT VALUE reported by the sys.sequences DMV on the primary database and on each replica.
As we can see during the test (~25 min) we have a couple of spikes in delay reported by the replica in North Europe  and one spike in delay from the US West. 
If we look at the average number of transaction the replicas are behind the primary database we'll see (as expected) that the US West provides the minimum average delay (186 transactions).
Surprisingly, the replica in Japan East was much better option at the time of the test than a replica in North Europe.

If we convert the delay in transactions into seconds by dividing each of the numbers by the average transaction rate at the Primary database (1,404) we get this table where numbers show delay in seconds.

And the chart for the delay in seconds is below:


I think the geo-replication is a very good option to consider when designing and developing a disaster resilient application.








Comments

Popular posts from this blog

Create 3-Node Windows 2012 Multi-subnet Cluster

Environment There are two Data centers connected via a WAN link. Two Windows 2012 Servers (called SQLDEV1 and SQLDEV2) are located in the Primary Data Center (on the IP subnet 192.168.79.0/24) and the third server is placed in the Secondary Data Center with the 192.168.69.0/24 subnet. We’ll be creating a three-node Windows cluster with no shared storage on the multi subnet network with a file share witness at the Primary Data Center. We’ll be using a file share witness to protect from the cluster failure in a situation when the network between the Data Centers is unavailable and one of the servers in the Primary Data Center is also down (or being rebooted). The final state will look like depicted above: -           Two Virtual IP’s will be assigned (192.168.76.218 and 192.168.69.134) to the cluster -           The servers at the Primary Data Center will have a vote (Vote=1) and the ...

Joining Windows 10 to Azure AD Domain

As of October 2016 to join Windows 10 computers to Azure AD Domain service requires these steps: Create a VNET in the classic portal . The VNET must be placed to a region where Azure AD domain service is available (( https://azure.microsoft.com/en-us/regions/services/ )  In the classic portal  go to Directory -> Configure and enable the domain service. And wait for ~ 30 min When completed the IP address will be populated Go back to the VNET configuration and add a DNS server with the IP (10.0.0.4 in this case) Create the "AAD DC Administrator" administrators group (again in Directory -> Group). Members of this group are granted administrative privileges on machines that are domain-joined to the Azure AD Domain Services managed domain. Add to the group your users who are supposed to have the administrative access on a Windows 10 computer go to Settings -> Accounts (this is true for Windows 10 version  1607) then select 'Access...

SQL 2012 AlwaysOn: Synchronous vs. Asynchronous commit. Performance impact

Recently I've had a chance to build a 3-server AlwaysOn environment distributed between the primary and secondary data centers. The configuration looks like this: Primary Data Center                         Secondary Data Center                        SQLDEV1                                        SQLDEV3          SQLDEV2 The availability group was crated with synchronous commit replicas on SQLDEV1 and SQLDEV2 and the replica on SQLDEV3 was configured for asynchronous commit. The link between the data centers was not great and when I pinged SQLDEV3 from SQLDEV1 I got these results Approximate round trip times in milli-seconds:     Minimum = 39ms, Maximum = 63ms, Average = 42ms I also created a very simp...