I've decided to compare the compression options available in SQL 2014 CTP2 and created two identical tables T1 (without compression) and T2 (with PAGE compression).
CREATE TABLE T1 (
ID INT IDENTITY PRIMARY KEY CLUSTERED
, FirstName VARCHAR(255)
, LastName VARCHAR(255)
, Age TinyInt
, Province CHAR(2)
);
GO
CREATE TABLE T2 (
ID INT IDENTITY PRIMARY KEY CLUSTERED
, FirstName VARCHAR(255)
, LastName VARCHAR(255)
, Age TinyInt
, Province CHAR(2)
) WITH (DATA_COMPRESSION = PAGE);
GO
CREATE TABLE T1 (
ID INT IDENTITY PRIMARY KEY CLUSTERED
, FirstName VARCHAR(255)
, LastName VARCHAR(255)
, Age TinyInt
, Province CHAR(2)
);
GO
CREATE TABLE T2 (
ID INT IDENTITY PRIMARY KEY CLUSTERED
, FirstName VARCHAR(255)
, LastName VARCHAR(255)
, Age TinyInt
, Province CHAR(2)
) WITH (DATA_COMPRESSION = PAGE);
GO
I populated them with 42,5 mil records like this
Declare
@i int,
@j int,
@age tinyint,
@id int,
@maxFN INT,
@maxLN INT,
@province CHAR(2)
, @st datetime;
Declare
@Prov TABLE (ID INT IDENTITY, Name CHAR(2));
SET NOCOUNT ON;
INSERT INTO @Prov(Name) VALUES ('ON');
INSERT INTO @Prov(Name) VALUES ('QC');
INSERT INTO @Prov(Name) VALUES ('NS');
INSERT INTO @Prov(Name) VALUES ('NB');
INSERT INTO @Prov(Name) VALUES ('MB');
INSERT INTO @Prov(Name) VALUES ('BC');
INSERT INTO @Prov(Name) VALUES ('PE');
INSERT INTO @Prov(Name) VALUES ('SK');
INSERT INTO @Prov(Name) VALUES ('NL');
INSERT INTO @Prov(Name) VALUES ('NT');
INSERT INTO @Prov(Name) VALUES ('YT');
INSERT INTO @Prov(Name) VALUES ('NU');
SELECT @maxFN = MAX(ID) FROM dbo.GivenNames;
SELECT @maxLN = MAX(ID) FROM dbo.SurNames;
SET @st = GETDATE();
SET @j = 1
WHILE @j<=345
BEGIN
BEGIN TRANSACTION;
SET @i = 1
WHILE @i <=123456
BEGIN
SET @age = (ABS(CONVERT(INT, convert(binary(8), newid()))) / 10001) % 117 +1
SELECT @Province = Name FROM @Prov WHERE ID = ( (@Age % 12) +1)
-- select @age, @province
INSERT INTO T1 VALUES (
(SELECT GivenName FROM GIVENNAMES WHERE ID = ABS(CONVERT(INT, convert(binary(8), newid())) / 345678) % @maxFN +1),
(SELECT SurName FROM SurNAMES WHERE ID = ABS(CONVERT(INT, convert(binary(8), newid())) / 123456) % @maxLN +1)
, @Age
, @Province
)
SET @i +=1;
END
COMMIT;
SET @j +=1
END;
SELECT DATEDIFF(second, @st, getdate()) AS 'Insert into T1 completed'
-- 1421
SET @st = GETDATE();
INSERT INTO T2 WITH (TABLOCK) ([FirstName], [LastName], [Age], [Province]) SELECT [FirstName], [LastName], [Age], [Province] FROM T1;
SELECT DATEDIFF(second, @st, getdate()) AS 'Insert into T2 completed'
-- 438
The sp_spaceused reported these results
EXEC sp_spaceused 't1';
-- Data Index
-- 1429904 KB 5344 KB
EXEC sp_spaceused 't2';
-- Data Index
-- 990288 KB 4120 KB
And then copied T1 into CST
SELECT * INTO CST FROM t1;
and created a columnstore clustered index
CREATE CLUSTERED COLUMNSTORE INDEX NCI_CST ON CST ;
and sp_spaceused reports
EXEC sp_spaceused 'CST';
-- Data Index
-- 304392 KB 0 KB
and finally I rebuilt the columnstore clustred index with the COLUMNSTORE_ARCHIVE compression
And now the results are even better
EXEC sp_spaceused 'CST';
-- Data
-- 230840 KB
And finally I exported the T1 table with BCP
bcp t1 out c:\temp\t1.txt -c -S .\sql2014ctp2 -d ColumnTest -T
and compressed the text file using 7-zip
The results clearly shows that the COLUMNSTORE compression is as good as 7-zip compression applied to the text file and COLUMNSTORE_ARCHIVE gives you additional 25% improvement.
Comments
Post a Comment