A note on compression performance

Oopps! Upgrade your browser pretty please. Oopps! Upgrade your browser pretty please.

Compression is an often discussed topic, as well as often under-estimated. In this post I would not spend too much time on discussing the caveats of compression, aside from pointing out that compression in its essence is  “trading off IO space and performance for the price of CPU cycles”. It is up to the DBA / end user to decide which one is more expensive, more valuable, over-utilized and so on.

I would like to mention the following, though: compression is correlated to rebuilding objects, which means that the paradigm of “rebuilding rules” applies with full force here as well. In other words, compression can be applied to heaps, indexes, partitions and so on; when the compression is applied, there is an option to set the operation as ONLINE / OFFLINE and to enjoy the benefits of parallelism. What are the tradeoffs: it depends how fast we need the object compressed, it depends on what resources we have available, it depends on the load of the system   As a test, I have a table with 20 million rows which has a clustered index. The goal is to alter it to use the PAGE compression with the least impact on the system.

A way to do this is by using this script: — OFFLINE with Parallelism on all USE [SomeDB] ALTER INDEX [IDX_1] ON [dbo].[SomeTable] REBUILD PARTITION = ALL WITH ( PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON, ONLINE = OFF, SORT_IN_TEMPDB = OFF , DATA_COMPRESSION = PAGE ) This takes about 30 seconds on my sandbox (it’s a beefy sandbox, right?!), and during this time my index is not available, since it is read completely and re-written again.

Another way to do it is to use this script: — ONLINE with 1 CPU USE [SomeDB] ALTER INDEX [IDX_1] ON [dbo].[SomeTable] REBUILD PARTITION = ALL WITH ( PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON, ONLINE = ON, MAXDOP = 1, SORT_IN_TEMPDB = ON , DATA_COMPRESSION = PAGE ) So, what is the difference: in the second script I am using the ONLINE availability and 1 CPU to compress the data. It is significantly slower process – it took about 5 minutes. But my data was online, and my CPU performance was not affected at all. Also, what is good to monitor during compression settings changes (aside from CPU, memory and IO pressure) is the SQL Server Access methods: Page Compression Attempts / sec and Pages Compressed / sec. In my case the first script was compressing an average of 2000 pages / sec and in the second case it was working on 330 pages / s.

Once again, “it depends” is the key phrase. 🙂