SQL Server statistics – multicolumn statistics

Oopps! Upgrade your browser pretty please. Oopps! Upgrade your browser pretty please.

As you already know, in SQL server performance is depending on correct statistics. The optimizer in SQL engine uses the statistics to create execution plans, and if the statistics are incorrect or out of date, the optimizer is misguided, with inefficient execution plans.

The statistics are auto created by default when you are using a filter on a column or when you create an index, BUT did you know that multicolumn statistics are not created automatically? Well, for multi column indexes, the statistics are created.

Well, how does that affect you and your performance? For some cases it will not affect you at all, like when the tables are small and your T-SQL statements are simple, but if you have large tables and lots of joins etc, you might wonder why everything is so slow.

 After some investigations you find an execution plans that is running slow and by looking at some of the access methods of the plan you find a big difference in estimated vs actual number of rows.

 

 

After running sp_updatestats, the estimates are still wrong, why? Well the optimizer looks at two different single column statistics and makes an incorrect estimate.

Try this sample:

— create a table —

CREATE TABLE [dbo].[StatsTest](
[ID] [int] IDENTITY(1,1) NOT NULL,
[User][varchar](50)NULL,
[Company][varchar](50)NULL,
CONSTRAINT [PK_dbo.StatsTest]PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX  =OFF,STATISTICS_NORECOMPUTE  =OFF,IGNORE_DUP_KEY=OFF,ALLOW_ROW_LOCKS  =ON,ALLOW_PAGE_LOCKS  =ON)ON[PRIMARY]
)ON[PRIMARY]
GO

— insert some values —
INSERT INTO [dbo].[StatsTest]([user],[company])
VALUES(‘Håkan Winther’,‘SQL Service’)
GO 10000 –add 10000 records

— insert some values —
INSERT INTO [dbo].[StatsTest]([user],[company])
VALUES(‘Håkan Winther1’,‘SQL Service’)
GO 10000

— insert some values —
INSERT INTO [dbo].[StatsTest]([user],[company])
VALUES(‘Håkan Winther’,‘SQL Service1’)
GO 10000

DBCC DROPCLEANBUFFERS
DBCC FREEPROCCACHE

— select some data —
SELECT * FROM[dbo].[StatsTest]
WHERE [USER]=‘Håkan Winther1’
AND Company=‘SQL Service’;

— check the stats to see if we got a multi column stats —
SELECT OBJECT_NAME([S].[object_id]),s.name,c.name,[S].[auto_created]
FROM sys.[stats] AS S
INNER JOI Nsys.[stats_columns]AS SC
ON [S].[object_id]=[SC].[object_id]
AND [S].[stats_id]=[SC].[stats_id]
INNER JOIN sys.columns c
ON sc.[object_id]=c.[object_id]
AND sc.[column_id]=c.[column_id]
WHERE s.[object_id]=OBJECT_ID(‘dbo.StatsTest’);

— create statistics manually —
CREATE STATISTICSmyStatsONdbo.StatsTest([user],Company);
GO

DBCC DROPCLEANBUFFERS
DBCC FREEPROCCACHE
GO

— select from the table again —
SELECT * FROM [dbo].[StatsTest]
WHERE [USER] =‘Håkan Winther’
AND Company=‘SQL Service1’;

— check the stats to see if we got a multi column stats — 
SELECT OBJECT_NAME([S].[object_id]),s.name,c.name,[S].[auto_created]
FROM sys.[stats] AS S
INNER JOIN sys.[stats_columns] AS SC
ON [S].[object_id]=[SC].[object_id]
AND [S].[stats_id]=[SC].[stats_id]
INNER JOIN sys.columns c
ON sc.[object_id]=c.[object_id]
AND sc.[column_id]=c.[column_id]
WHERE s.[object_id]=OBJECT_ID(‘dbo.StatsTest’);

DBCC SHOW_STATISTICS(‘dbo.StatsTest’,‘myStats’)

DROP TABLE dbo.StatsTest;

As you can see, the estimates are more correct when you have a multicolumn statistics 

Solution:

You have two options to solve this problem, the best option is to create a multicolumn index.
Another options is to create multicolumn statistics

If you want to know more, feel free to contact any of our SQL server experts and we will help you.