Microsoft Azure offers the standard and active geo-replication capabilities for Azure SQL databases.
This feature implements a mechanism to provide database redundancy within the same Microsoft Azure region or in different regions (geo-redundancy). Active Geo-Replication asynchronously replicates committed transactions from a database to up to four copies of the database on different servers.
When Active Geo-Replication is configured a secondary database is created on the specified server. The original database becomes the primary database. The primary database asynchronously replicates committed transactions to each of the secondary databases. While at any given point, the secondary database might be slightly behind the primary database, the secondary data is guaranteed to always be transactionally consistent with changes committed to the primary database.
When you place the secondary database on a server in a different region you add maximum resilience to your application. The cross-region redundancy enables applications to recover from a permanent loss of an entire datacenter or parts of a datacenter caused by natural disasters, catastrophic human errors, or malicious acts.
Let's put it to the test to see how far behind the primary database the asynchronous replicas will be.
We'll need to have readable secondary replicas to be able to compare the number of transactions committed on different servers.
The easiest way to compare business transactions is to create a Sequence object in the database and then in each INSERT/UPDATE use the NEXT VALUE FOR the Sequence and then compare the next available value for the sequence by querying the sys.sequences DMV.
For the test I create the sequence object dbo.Seq1
CREATE SEQUENCE dbo.Seq1 AS INT
START WITH 1 INCREMENT BY 1 NO CACHE NO CYCLE
and the main table
CREATE TABLE dbo.T1(
ID INT NOT NULL PRIMARY KEY CLUSTERED,
NumOfUpdates INT DEFAULT(0),
LastSPID INT,
CurSeq INT NOT NULL,
DummyText VARCHAR(1024),
UpdateTime datetime2,
CreateTime datetime2 DEFAULT(SYSDATETIME()));
In the mail stored procedure I
1. generate a random number between 1 and 100,000,000 which used as a key for table T1.
2. If the table already has a row with the generated key then this record will be updated with
a. a new dummy text (with a length between 21 and 1024 characters) to cause page splits when updates the existing row
b. UpdateTime equals to the current SYSDATETIME()
c. CurSeq equals the NEXT VALUE FOR dbo.Seq1
d. LastSPID with the SPID of the process updating the record
e. NumOfUpdates increments by 1
3. If the table doesn't have a row with the generated key then a new row gets inserted with
a. a dummy text (with a length between 21 and 1024 characters)
b. CreateTime equals to the current SYSDATETIME()
c. CurSeq equals the NEXT VALUE FOR dbo.Seq1
d. LastSPID with the SPID of the process updating the record
e. NumOfUpdates sets to 0
I created a primary database in the "East US" data center as V12 with the P2 level and ran 11 concurrent session calling the main stored procedure. All the clients were running from the single computer (my laptop) and I collected each second the next available sequence value
SELECT current_value from sys.sequences WITH (NOLOCK)
The average difference in the current value was 1,404 - so on average each second these 11 clients were able to make 1404 INSERTs/UPDATEs to the dbo.T1.
This is baseline for the geo-replication test.
Then I truncated all the tables in the primary database and configured 3 active replicas in the following regions:
1. West US
2. North Europe
3. Japan East
In the Azure portal the replication topology looks like this
And then I ran the same test with 11 concurrent clients again.
The CPU utilization on the test database as 100% as soon as the clients started
CPU utilization on each replica was close to 0
The average number of transactions (measured every second as a difference in the next value for the sequence object) was 1,405. The same as in the baseline.
Thus, adding 3 readable asynchronous replicas didn't hurt the primary database. This is good.
Now let's look at the transaction level measured at the beginning of each 5-second interval as a difference in the NEXT VALUE reported by the sys.sequences DMV on the primary database and on each replica.
As we can see during the test (~25 min) we have a couple of spikes in delay reported by the replica in North Europe and one spike in delay from the US West.
If we look at the average number of transaction the replicas are behind the primary database we'll see (as expected) that the US West provides the minimum average delay (186 transactions).
Surprisingly, the replica in Japan East was much better option at the time of the test than a replica in North Europe.
If we convert the delay in transactions into seconds by dividing each of the numbers by the average transaction rate at the Primary database (1,404) we get this table where numbers show delay in seconds.
And the chart for the delay in seconds is below:
I think the geo-replication is a very good option to consider when designing and developing a disaster resilient application.
Comments
Post a Comment