Recently while working on a TB size SQL Server database, I got the opportunity to verify something that I have suspected for some time, but not yet looked into properly.
I am talking about how to optimize UPDATE STATISTICS performance by utilizing the SQL Server 2008 R2 Enterprise Edition Compression feature. It is a common scenario today to have a SQL Server cluster with plenty of CPU available, on top of SAN disks. So too in this case. And usually, you tend to be IO bound in this scenario. The problem escalates when your tables are so big that they do not fit in the cache. And what seems to happen when you rebuild statistics on a table is that the statistics are rebuild one by one.
So if you for example have statistics on 10 columns (index statistics aside), SQL Server will scan the table for the first column (from disk mostly), updating the statistics. But since the table does not fit in memory, when the time comes to scan the table to find information about the second column, the table have to be read from disk again. And so on for every column. Obviously this is a slow and tedious process. But what happens if you compress your tables, specifically the clustered index of your big tables? The theory is that since the table/index will stay compressed in memory as well, more of the table should fit there at one, dramatically speeding up the process described in the previous paragraph. And as mentioned, I finally got around to verifying this in a real production environment.
The result? Absolutely fantastic! I saw a 5 – 8 times improvement in running ”UPDATE STATISTICS WITH SAMPLE 40 PERCENT” wich is what we use at this particular customer. And that just from enabling a feature that the customer had already paid for, and is completely transparent to the application. There are a lot of other advantages to SQL Server compression (and other enterprise features), but more on that later 🙂
If you need SQL Server assistance, don’t hesitate to contact us!