Skip to main content

SQL 2014 - ColumnStore Compression

I've decided to compare the compression options available in SQL 2014 CTP2 and created two identical tables T1 (without compression) and T2 (with PAGE compression).

CREATE TABLE T1 (
ID INT IDENTITY PRIMARY  KEY CLUSTERED
, FirstName VARCHAR(255)
, LastName VARCHAR(255)
, Age TinyInt
, Province CHAR(2)
);
GO

CREATE TABLE T2 (
ID INT IDENTITY PRIMARY  KEY CLUSTERED
, FirstName VARCHAR(255)
, LastName VARCHAR(255)
, Age TinyInt
, Province CHAR(2)
) WITH (DATA_COMPRESSION = PAGE);
GO

I populated them with 42,5 mil records like this

Declare
@i int,
@j int,
@age tinyint,
@id int,
@maxFN INT,
@maxLN INT,
@province CHAR(2)
, @st datetime;

Declare
@Prov TABLE (ID INT IDENTITY, Name CHAR(2));


SET NOCOUNT ON;

INSERT INTO @Prov(Name) VALUES ('ON');
INSERT INTO @Prov(Name) VALUES ('QC');
INSERT INTO @Prov(Name) VALUES ('NS');
INSERT INTO @Prov(Name) VALUES ('NB');
INSERT INTO @Prov(Name) VALUES ('MB');
INSERT INTO @Prov(Name) VALUES ('BC');
INSERT INTO @Prov(Name) VALUES ('PE');
INSERT INTO @Prov(Name) VALUES ('SK');
INSERT INTO @Prov(Name) VALUES ('NL');
INSERT INTO @Prov(Name) VALUES ('NT');
INSERT INTO @Prov(Name) VALUES ('YT');
INSERT INTO @Prov(Name) VALUES ('NU');

SELECT @maxFN = MAX(ID) FROM dbo.GivenNames;
SELECT @maxLN = MAX(ID) FROM dbo.SurNames;

SET @st = GETDATE();
SET @j = 1
WHILE @j<=345
BEGIN
BEGIN TRANSACTION;
SET @i = 1
WHILE @i <=123456
BEGIN
SET @age = (ABS(CONVERT(INT, convert(binary(8), newid()))) / 10001) % 117 +1

SELECT @Province = Name FROM @Prov WHERE ID = ( (@Age % 12) +1)
-- select @age, @province
INSERT INTO T1 VALUES (
(SELECT GivenName FROM GIVENNAMES WHERE ID = ABS(CONVERT(INT, convert(binary(8), newid())) / 345678) % @maxFN +1),
(SELECT SurName   FROM SurNAMES   WHERE ID = ABS(CONVERT(INT, convert(binary(8), newid())) / 123456) % @maxLN +1)
, @Age
, @Province
)
SET @i +=1;
END
COMMIT;
SET @j +=1
END;
SELECT DATEDIFF(second, @st, getdate()) AS 'Insert into T1 completed'
-- 1421

SET @st = GETDATE();
INSERT INTO T2 WITH (TABLOCK) ([FirstName], [LastName], [Age], [Province]) SELECT [FirstName], [LastName], [Age], [Province] FROM T1;
SELECT DATEDIFF(second, @st, getdate()) AS 'Insert into T2 completed'
-- 438

The sp_spaceused reported these results

EXEC sp_spaceused 't1';
-- Data Index
-- 1429904 KB 5344 KB
EXEC sp_spaceused 't2';
-- Data Index
-- 990288 KB 4120 KB


And then copied T1 into CST

SELECT * INTO CST FROM t1;

and created a columnstore clustered index

CREATE CLUSTERED COLUMNSTORE INDEX NCI_CST ON CST ;

and sp_spaceused reports 

EXEC sp_spaceused 'CST';
-- Data Index
-- 304392 KB 0 KB

and finally I rebuilt the columnstore clustred index with the COLUMNSTORE_ARCHIVE compression

And now the results are even better

EXEC sp_spaceused 'CST';
-- Data
-- 230840 KB


And finally I exported the T1 table with BCP

bcp t1 out c:\temp\t1.txt -c   -S .\sql2014ctp2 -d ColumnTest -T

and compressed the text file using 7-zip

The results clearly shows that the COLUMNSTORE compression is as good as 7-zip compression applied to the text file and COLUMNSTORE_ARCHIVE gives you additional 25% improvement.



Comments

Popular posts from this blog

Create 3-Node Windows 2012 Multi-subnet Cluster

Environment There are two Data centers connected via a WAN link. Two Windows 2012 Servers (called SQLDEV1 and SQLDEV2) are located in the Primary Data Center (on the IP subnet 192.168.79.0/24) and the third server is placed in the Secondary Data Center with the 192.168.69.0/24 subnet. We’ll be creating a three-node Windows cluster with no shared storage on the multi subnet network with a file share witness at the Primary Data Center. We’ll be using a file share witness to protect from the cluster failure in a situation when the network between the Data Centers is unavailable and one of the servers in the Primary Data Center is also down (or being rebooted). The final state will look like depicted above: -           Two Virtual IP’s will be assigned (192.168.76.218 and 192.168.69.134) to the cluster -           The servers at the Primary Data Center will have a vote (Vote=1) and the ...

Joining Windows 10 to Azure AD Domain

As of October 2016 to join Windows 10 computers to Azure AD Domain service requires these steps: Create a VNET in the classic portal . The VNET must be placed to a region where Azure AD domain service is available (( https://azure.microsoft.com/en-us/regions/services/ )  In the classic portal  go to Directory -> Configure and enable the domain service. And wait for ~ 30 min When completed the IP address will be populated Go back to the VNET configuration and add a DNS server with the IP (10.0.0.4 in this case) Create the "AAD DC Administrator" administrators group (again in Directory -> Group). Members of this group are granted administrative privileges on machines that are domain-joined to the Azure AD Domain Services managed domain. Add to the group your users who are supposed to have the administrative access on a Windows 10 computer go to Settings -> Accounts (this is true for Windows 10 version  1607) then select 'Access...

SQL 2012 AlwaysOn: Synchronous vs. Asynchronous commit. Performance impact

Recently I've had a chance to build a 3-server AlwaysOn environment distributed between the primary and secondary data centers. The configuration looks like this: Primary Data Center                         Secondary Data Center                        SQLDEV1                                        SQLDEV3          SQLDEV2 The availability group was crated with synchronous commit replicas on SQLDEV1 and SQLDEV2 and the replica on SQLDEV3 was configured for asynchronous commit. The link between the data centers was not great and when I pinged SQLDEV3 from SQLDEV1 I got these results Approximate round trip times in milli-seconds:     Minimum = 39ms, Maximum = 63ms, Average = 42ms I also created a very simp...