Why DATETIME can slow down your query

Oopps! Upgrade your browser pretty please. Oopps! Upgrade your browser pretty please.

Joins, CAST and UDFs are a bad mix

Transactional data is usually stored with some sort of time stamp. This time stamp is more often than not of type DATETIME, the data will look like this: ‘2016-02-05 15:22:15.167’

When you want to sum or group this data per day you run into problem because ‘2016-02-05 15:22:15.167’ and ‘2016-02-05 15:23:44.543’ is the same day but it’s not the same time.

A quick fix is to CAST the data to DATE,  CAST(@MyTimeStamp as DATE) or you could define a function that does it. The problem with this approach is that the query optimizer can not use the indexes but has to resort to a table scan which is VERY slow.

Solution

One solution is to define another column of type DATE where the date is stored without the time part. In the example in the code below the same query time goes from 2200 ms to 33 ms on my machine. That is 67 times faster.

A solution with a cache

If the data arrives in batches, usually early in the morning, and the same query is run over and over during the day in various reports, the result of the query can be cached.

In the example below the query time goes from 33ms to 0 ms. Yep, ZERO ms.

 


USE test;
GO

SET STATISTICS TIME ON;
--SET STATISTICS TIME OFF
-----------------------------------
--Create the test tables

IF OBJECT_ID('dbo.Dates1','U') IS NOT NULL
BEGIN
  DROP TABLE dbo.Dates1;
END;
GO

IF OBJECT_ID('dbo.Dates2','U') IS NOT NULL
BEGIN
  DROP TABLE dbo.Dates2;
END;
GO

CREATE TABLE Dates1(CalendarDateTime DATETIME,CalendarDate DATE);
GO
CREATE TABLE Dates2(CalendarDateTime DATETIME,CalendarDate DATE);
GO
-----------------------------------
--Insert random data
--Dates going back ~10 years

DECLARE
@i INT = 0;

WHILE @i < 50000
BEGIN
  --Dates going 10 years back
  INSERT INTO Dates1(CalendarDateTime)
  SELECT DATEADD(SECOND,-10 * 365 * 24 * 60 * 60 * RAND(),GETDATE());
  INSERT INTO Dates2(CalendarDateTime)
  SELECT DATEADD(SECOND,-10 * 365 * 24 * 60 * 60 * RAND(),GETDATE());
  SET @i = @i + 1;
END;
GO
--Update the column of type DATE

UPDATE Dates1
SET CalendarDate = CalendarDateTime;
GO

UPDATE Dates2
SET CalendarDate = CalendarDateTime;
GO
--Create indexes

CREATE NONCLUSTERED INDEX IX_Caldt ON dbo.Dates1(CalendarDateTime);
GO
CREATE NONCLUSTERED INDEX IX_Caldt ON dbo.Dates2(CalendarDateTime);
GO

CREATE NONCLUSTERED INDEX IX_Cald ON dbo.Dates1(CalendarDate);
GO
CREATE NONCLUSTERED INDEX IX_Cald ON dbo.Dates2(CalendarDate);
GO
-----------------------------------

-- SQL Server 2016 only
--DROP FUNCTION IF EXISTS dbo.DateOnlySlow
--go
--Create a very slow User Defined Function, UDF
--You should NOT try this at home 🙂

IF OBJECT_ID('dbo.DateOnlySlow','FN') IS NOT NULL
BEGIN
  DROP FUNCTION dbo.DateOnlySlow;
END;
GO

CREATE FUNCTION dbo.DateOnlySlow(@CalendarDateTime DATETIME)
RETURNS DATE
AS
BEGIN
  RETURN @CalendarDateTime;
END;
GO
-----------------------------------

--Slowest: Using User defined function, DateOnlySlow
SELECT TOP 10 dbo.DateOnlySlow(s1.CalendarDateTime),COUNT(*)
FROM Dates1 AS s1
JOIN Dates2 AS s2 ON dbo.DateOnlySlow
(s2.CalendarDateTime) = dbo.DateOnlySlow(s1.CalendarDateTime)
GROUP BY dbo.DateOnlySlow(s1.CalendarDateTime)
ORDER BY COUNT(*) DESC,dbo.DateOnlySlow(s1.CalendarDateTime);
GO

--Faster: Casting in the query
SELECT TOP 10 CAST(s1.CalendarDateTime AS DATE),COUNT(*)
FROM Dates1 AS s1
JOIN Dates2 AS s2 ON CAST(s2.CalendarDateTime AS DATE) = CAST(s1.CalendarDateTime AS DATE)
GROUP BY CAST(s1.CalendarDateTime AS DATE)
ORDER BY COUNT(*) DESC,CAST(s1.CalendarDateTime AS DATE);
GO

--Even faster: Casting one side in the join and using the column 'CalendarDate' on the other
SELECT TOP 10 CAST(s1.CalendarDateTime AS DATE),COUNT(*)
FROM Dates1 AS s1
JOIN Dates2 AS s2 ON s2.CalendarDate = CAST(s1.CalendarDateTime AS DATE)
GROUP BY CAST(s1.CalendarDateTime AS DATE)
ORDER BY COUNT(*) DESC,CAST(s1.CalendarDateTime AS DATE);
GO

--Fastest: Joining only on 'CalendarDate
SELECT TOP 10 s1.CalendarDate,COUNT(*)
FROM Dates1 AS s1
JOIN Dates2 AS s2 ON s2.CalendarDate = s1.CalendarDate
GROUP BY s1.CalendarDate
ORDER BY COUNT(*) DESC,s1.CalendarDate;
GO

-----------------------------------
--Awesome speed: Pre-processing the query and cache the result in ##IncludedSalesDates

-- SQL Server 2016 only
--DROP TABLE IF EXISTS ##IncludedSalesDates

IF OBJECT_ID('tempdb..##IncludedSalesDates','U') IS NOT NULL
BEGIN
  DROP TABLE ##IncludedSalesDates;
END;

SELECT s1.CalendarDate,COUNT(*) AS num
INTO ##IncludedSalesDates
FROM Dates1 AS s1
JOIN Dates2 AS s2 ON s2.CalendarDate = s1.CalendarDate
GROUP BY s1.CalendarDate;

CREATE CLUSTERED INDEX IX_Cald ON dbo.##IncludedSalesDates(num DESC,CalendarDate);
GO

SELECT TOP 10 *
FROM ##IncludedSalesDates
ORDER BY num DESC,CalendarDate;
GO