Guest Blog: Performance Tip – Transactions Can Actually Boost Throughput

Oopps! Upgrade your browser pretty please. Oopps! Upgrade your browser pretty please.

SQL Service are happy to welcome Michael K Campbell as a guest blogger! Michael K Campbell (mike@sqlserveraudits.com and www.sqlserveraudits.com) is a contributing editor for SQL Server Magazine and a consultant with years of SQL Server DBA and developer experience. He enjoys consulting, development, and creating free videos for www.sqlservervideos.com

Over to you Michael!

************************************

Good developers typically have a good handle on what Donald Knuth’s famous quote about “premature optimization [being] the root of all evil” is all about – it’s a constant reminder that developers shouldn’t needlessly agonize over every line of code and worry about its potential performance cost. Instead, they should code as necessary (using as many best practices as possible), and then come BACK and manage any performance tuning needed – once their application is put under load.

Still, one area where seasoned developers typically tend to be a bit worried or leery about potential performance problems is when it comes to explicit transactions – because any seasoned developer knows that transactions come with a certain amount of processing overhead and an increase in resource utilization.

Explicit Transactions Can Actually Make SOME Operations Faster

As such, it can come as a big surprise to many developers that there are actually SOME kinds of operations that you can actually SPEED up in SQL Server when you wrap them within explicit transactions. And the key to gaining a sense for how all of this works is to remember that the SQL Server Transaction Log has to keep writing out changes, serially, as each and every data modification comes through. Consequently, if a large number of UPDATE, for example, or INSERTs, is fired off – one after another, that means that the Transaction Log is going to be hit by a bunch of back-to-back, individual and distinct, operations that can each be rolled back, independently, if needed.

Of late, I’ve spent a decent amount of time working with a couple of different clients to help diagnose potential problems with multi-tenant servers – where large numbers of transactions can have an ugly ‘aggregate’ effect when there are lots of writes being sent to multiple databases at the same time. Accordingly, one of the things I’ve been paying attention to a lot of late is the SQL Server Log Manager and how it works – including how it buffers data and so on. And, along those lines, I was reminded of the fact that in SOME cases, explicitly blocking lots of (typically repeated) operations into a single, explicit, transaction can actually do a lot of good in terms of reducing disk-activity generated by the transaction log.

Proof is in the Pudding – Or: Only Attempt this with Appropriate Testing

Granted, this isn’t something you’d necessarily believe without seeing some hard numbers to back up such a claim NOR is it something you’d want to blindly consider for all operations without checking the actual metrics as well. So, with that in mind, let’s look at some examples.

To do that, you’ll want to carve out a little sandbox for yourself – somewhere in a NON-production database. For me, that’s my ‘Meddling’ database, which, as you can see, I start of with by creating a new table, and then running some backups to make sure I can clear out any transactions in the transaction log:

USE Meddling
GO  
IF OBJECT_ID('dbo.TransactionTest','U') IS NOT NULL
        DROP TABLE dbo.TransactionTest
GO
CREATE TABLE dbo.TransactionTest (
        RowId int IDENTITY(1,1) NOT NULL, 
        OptionName nvarchar(40) NOT NULL, 
        OptionValue int NOT NULL
 )
GO  
BACKUP DATABASE Meddling TO DISK = N'D:SQLBackupsMeddling_Full.bak'
WITH COMPRESSION
GO
BACKUP LOG Meddling TO DISK = N'D:SQLBackupsMeddling_FULL.trn'
WITH COMPRESSION
GO
 

Then, once I’ve created a table and cleared everything out, I can check on the amount of space being used in my transaction log. The documented, or supported, way to do this is to use DBCC SQLPERF() command with the LOGSPACE argument, but I personally hate how that shows me log-space info for all tables. (Er, well, I hate that aspect when working on a single database – especially when I have a lot.) So, I prefer to use an admittedly ugly way of getting the same info – by using performance counters:

 

SELECT 
        dbs.name,
        ls.cntr_value / 1024 AS [Log Size (MB)],
        lu.cntr_value / 1024 AS [Log Used (MB)]
FROM 
        sys.databases dbs
        INNER JOIN sys.dm_os_performance_counters lu ON dbs.name = lu.instance_name
        INNER JOIN sys.dm_os_performance_counters ls ON dbs.name = ls.instance_name
WHERE
        lu.counter_name LIKE N'Log File(s) Used Size (KB)%' 
        AND ls.counter_name LIKE N'Log File(s) Size (KB)%'
        AND dbs.database_id = DB_ID()
 

As the benefit of such an approach is that it’s pretty easy to restrict results down to a single database.

And, in my case, after a t-log backup, I’m using roughly 4MB of my 100MB log. And, with such a base-line set, I can then kick off a pretty simple operation where I’ll log 20,000 back to back inserts by creating some bogus data from the master’s spt_value table (something that’s really only good for demos):

 

SET NOCOUNT ON 
GO
DECLARE cursorsAreUgly CURSOR FAST_FORWARD 
FOR     
        SELECT TOP 20000 t1.name, t1.number 
        FROM master.dbo.spt_values t1 
                 LEFT JOIN master.dbo.spt_values t2 ON t1.number = t2.number
                 LEFT JOIN master.dbo.spt_values t3 ON t2.name = t3.name
                 LEFT JOIN master.dbo.spt_values t4 ON t3.name = t1.name
        ORDER BY NEWID()
        
DECLARE @name nvarchar(40), @number int
OPEN cursorsAreUgly
FETCH NEXT FROM cursorsAreUgly INTO @name, @number 
WHILE @@FETCH_STATUS = 0 BEGIN 
        INSERT INTO TransactionTest
        VALUES (ISNULL(@name,''), @number)
        FETCH NEXT FROM cursorsAreUgly INTO @name, @number 
END
CLOSE cursorsAreUgly
DEALLOCATE cursorsAreUgly
GO

And, while 20,000 INSERTs may seem like a lot (and is), these are exceptionally narrow rows – so bear that in mind when comparing this technique against your own data.

Likewise, in my example, I’m using a cursor – but your code might be processing either INSERTs or UPDATEs much differently.

The key thing, though, is that after running that statement above (which takes 13 seconds on my dev/testing server that runs older/cruddy SATA drives), I end up with 13MB of my log file used.

If, however, I re-backup the transaction log (to clear it out – and get back down to around 4MB used), and then run the following operation – which wraps everything in an EXPLICIT transaction (something that might make you think this operation would take LONGER than the implicit version listed above) :

 

SET NOCOUNT ON 
GO
BEGIN TRAN
        DECLARE cursorsAreUgly CURSOR FAST_FORWARD 
        FOR      
                 SELECT TOP 20000 t1.name, t1.number 
                 FROM master.dbo.spt_values t1 
                         LEFT JOIN master.dbo.spt_values t2 ON t1.number = t2.number
                         LEFT JOIN master.dbo.spt_values t3 ON t2.name = t3.name
                         LEFT JOIN master.dbo.spt_values t4 ON t3.name = t1.name
                 ORDER BY NEWID()
        DECLARE @name nvarchar(40), @number int
        OPEN cursorsAreUgly
        FETCH NEXT FROM cursorsAreUgly INTO @name, @number 
        WHILE @@FETCH_STATUS = 0 BEGIN 
                  INSERT INTO TransactionTest
                 VALUES (ISNULL(@name,''), @number)
                 FETCH NEXT FROM cursorsAreUgly INTO @name, @number 
        END
        CLOSE cursorsAreUgly
        DEALLOCATE cursorsAreUgly
COMMIT
GO

I end up with an operation that only took 8 seconds, and which only managed to generate an additional 3MB of transaction log usage (taking me up to 7MB used total instead of 13MB – quite a nice savings). And, of course, if I watch disk performance counters, I can see that they’re the primary source of contention during the previous operation – or they end up being my bottleneck because I’m thumping them over and over and over again.

In fact, if you’d like to get a better insight into what’s going on, run the tests above in your own environment and pay attention to your disk metrics AND take a peek at what’s being logged to the T-LOG itself by means of the most-excellent, and undocumented, T-SQL functionality made available by fn_dblog() as follows:

SELECT * FROM fn_dblog(NULL, NULL)

And, just make sure that you backup your t-log between each of your tests – to clear out used data and to make it easier to see the differences in what’s going on by looking at fn_dblog() output.

Conclusion

Obviously, this blog post isn’t an open invitation to run out and wrap every single operation in an explicit transaction. Instead, it’s a reminder that when you know what’s going on under the covers, performance benefits and boosts can come in what might, otherwise, seem to be surprising ways. That, and this post is a reminder that testing and hard-numbers are a great way to ensure performance optimizations when they’re actually needed.