Skip to main content

SQL 2014 - ColumnStore Compression

I've decided to compare the compression options available in SQL 2014 CTP2 and created two identical tables T1 (without compression) and T2 (with PAGE compression).

CREATE TABLE T1 (
ID INT IDENTITY PRIMARY  KEY CLUSTERED
, FirstName VARCHAR(255)
, LastName VARCHAR(255)
, Age TinyInt
, Province CHAR(2)
);
GO

CREATE TABLE T2 (
ID INT IDENTITY PRIMARY  KEY CLUSTERED
, FirstName VARCHAR(255)
, LastName VARCHAR(255)
, Age TinyInt
, Province CHAR(2)
) WITH (DATA_COMPRESSION = PAGE);
GO

I populated them with 42,5 mil records like this

Declare
@i int,
@j int,
@age tinyint,
@id int,
@maxFN INT,
@maxLN INT,
@province CHAR(2)
, @st datetime;

Declare
@Prov TABLE (ID INT IDENTITY, Name CHAR(2));


SET NOCOUNT ON;

INSERT INTO @Prov(Name) VALUES ('ON');
INSERT INTO @Prov(Name) VALUES ('QC');
INSERT INTO @Prov(Name) VALUES ('NS');
INSERT INTO @Prov(Name) VALUES ('NB');
INSERT INTO @Prov(Name) VALUES ('MB');
INSERT INTO @Prov(Name) VALUES ('BC');
INSERT INTO @Prov(Name) VALUES ('PE');
INSERT INTO @Prov(Name) VALUES ('SK');
INSERT INTO @Prov(Name) VALUES ('NL');
INSERT INTO @Prov(Name) VALUES ('NT');
INSERT INTO @Prov(Name) VALUES ('YT');
INSERT INTO @Prov(Name) VALUES ('NU');

SELECT @maxFN = MAX(ID) FROM dbo.GivenNames;
SELECT @maxLN = MAX(ID) FROM dbo.SurNames;

SET @st = GETDATE();
SET @j = 1
WHILE @j<=345
BEGIN
BEGIN TRANSACTION;
SET @i = 1
WHILE @i <=123456
BEGIN
SET @age = (ABS(CONVERT(INT, convert(binary(8), newid()))) / 10001) % 117 +1

SELECT @Province = Name FROM @Prov WHERE ID = ( (@Age % 12) +1)
-- select @age, @province
INSERT INTO T1 VALUES (
(SELECT GivenName FROM GIVENNAMES WHERE ID = ABS(CONVERT(INT, convert(binary(8), newid())) / 345678) % @maxFN +1),
(SELECT SurName   FROM SurNAMES   WHERE ID = ABS(CONVERT(INT, convert(binary(8), newid())) / 123456) % @maxLN +1)
, @Age
, @Province
)
SET @i +=1;
END
COMMIT;
SET @j +=1
END;
SELECT DATEDIFF(second, @st, getdate()) AS 'Insert into T1 completed'
-- 1421

SET @st = GETDATE();
INSERT INTO T2 WITH (TABLOCK) ([FirstName], [LastName], [Age], [Province]) SELECT [FirstName], [LastName], [Age], [Province] FROM T1;
SELECT DATEDIFF(second, @st, getdate()) AS 'Insert into T2 completed'
-- 438

The sp_spaceused reported these results

EXEC sp_spaceused 't1';
-- Data Index
-- 1429904 KB 5344 KB
EXEC sp_spaceused 't2';
-- Data Index
-- 990288 KB 4120 KB


And then copied T1 into CST

SELECT * INTO CST FROM t1;

and created a columnstore clustered index

CREATE CLUSTERED COLUMNSTORE INDEX NCI_CST ON CST ;

and sp_spaceused reports 

EXEC sp_spaceused 'CST';
-- Data Index
-- 304392 KB 0 KB

and finally I rebuilt the columnstore clustred index with the COLUMNSTORE_ARCHIVE compression

And now the results are even better

EXEC sp_spaceused 'CST';
-- Data
-- 230840 KB


And finally I exported the T1 table with BCP

bcp t1 out c:\temp\t1.txt -c   -S .\sql2014ctp2 -d ColumnTest -T

and compressed the text file using 7-zip

The results clearly shows that the COLUMNSTORE compression is as good as 7-zip compression applied to the text file and COLUMNSTORE_ARCHIVE gives you additional 25% improvement.



Comments

Popular posts from this blog

Joining Windows 10 to Azure AD Domain

As of October 2016 to join Windows 10 computers to Azure AD Domain service requires these steps: Create a VNET in the classic portal . The VNET must be placed to a region where Azure AD domain service is available (( https://azure.microsoft.com/en-us/regions/services/ )  In the classic portal  go to Directory -> Configure and enable the domain service. And wait for ~ 30 min When completed the IP address will be populated Go back to the VNET configuration and add a DNS server with the IP (10.0.0.4 in this case) Create the "AAD DC Administrator" administrators group (again in Directory -> Group). Members of this group are granted administrative privileges on machines that are domain-joined to the Azure AD Domain Services managed domain. Add to the group your users who are supposed to have the administrative access on a Windows 10 computer go to Settings -> Accounts (this is true for Windows 10 version  1607) then select 'Access work

Create 3-Node Windows 2012 Multi-subnet Cluster

Environment There are two Data centers connected via a WAN link. Two Windows 2012 Servers (called SQLDEV1 and SQLDEV2) are located in the Primary Data Center (on the IP subnet 192.168.79.0/24) and the third server is placed in the Secondary Data Center with the 192.168.69.0/24 subnet. We’ll be creating a three-node Windows cluster with no shared storage on the multi subnet network with a file share witness at the Primary Data Center. We’ll be using a file share witness to protect from the cluster failure in a situation when the network between the Data Centers is unavailable and one of the servers in the Primary Data Center is also down (or being rebooted). The final state will look like depicted above: -           Two Virtual IP’s will be assigned (192.168.76.218 and 192.168.69.134) to the cluster -           The servers at the Primary Data Center will have a vote (Vote=1) and the server at the Secondary Data Center will have no vote (Vote=0). The file share witness al

Generate Calendar Table in Power BI with Fiscal Year Attributes

In Power BI go to Get Data --> Blank Query and paste into the Function windows the text below. This function takes as parameters: - StartYear (e.g., 2012) - EndYear (e.g., 2018) -FiscalYearStartMonth (e.g., 4) And it will generate a calendar table for dates from Jan-1-<StartYear> till Dec-31-<EndYear> and columns for Fiscal Year, Fiscal Month, Fiscal Quarter, Fiscal Year Day where the Fiscal year begins on FiscalYearStartMonth = (StartYear as number, EndYear as number, FiscalYearStartMonth as number)=> let     //Capture the date range from the parameters     StartDate = #date(StartYear, 1, 1),     EndDate = #date(EndYear, 12, 31), //Get the number of dates that will be required for the table     GetDateCount = Duration.Days(EndDate - StartDate)+1, //Take the count of dates and turn it into a list of dates     GetDateList = List.Dates(StartDate, GetDateCount,     #duration(1,0,0,0)), //Convert the list into a table     DateListToTable