SQL server indexing strategies part 3 – non clustered indexes

Oopps! Upgrade your browser pretty please. Oopps! Upgrade your browser pretty please.

This is my third blog post about indexing strategies for SQL server and it has passed some time since the last one, but now it’s time for the next.

In SQL server 2005, Microsoft implemented something called included columns, and although it’s more than 7 years since SQL server 2005 was brought to life, I still get the questions:

  • What is an included column?
  • Why do I need included columns?
  • How should I use included columns?

 I think it is time to repeat the answers and give some kind of explanation.

First of, lets explain the difference between the clustered and the non clustered index in a simple way. Think of a book with an index. The index of the book give you a structured way of finding the right page to read. A clustered index is like a book, you have the pages with all the information and you have an index pointing at the correct page.

You can seek the index to find the page or you can scan all the pages to get what you want, but you can only seek the index by the key columns. If you don’t have any values to search for in the key columns, you have to scan the pages.

The non clustered index is more like a quick reference card where you have some other key columns in the index, but the non clustered index doesn’t have the complete information in the pages. The pages in the non clustered index contains the clustered index keys AND the INCLUDED COLUMNS.

By searching the non clustered index, you’ll only find the key words you need to search the clustered index to get the rest of the information.

Assume you want to get all columns for a customer record and the clustering key in the clustered index is a counter. You probably do not know the value of the counter for a specific record, but you know the customer name. To get some performance in the query you add non clustered index on customer name.

Let’t take a look at some code:

CREATE SCHEMA SALES GO

CREATE TABLE [Sales].[Customer](
[ID] [int] IDENTITY(1,1) NOT NULL,
[CustomerName] [varchar](75) COLLATE Finnish_Swedish_CI_AS NOT NULL,
[CustomerGroup] [int] NULL,
[CustomerType] [int] NULL,
CONSTRAINT [PK_Customer] PRIMARY KEY CLUSTERED ( [ID] ASC ))
GO

CREATE NONCLUSTERED INDEX ix_Customer_AK ON Sales.Customer ( CustomerName )
GO
INSERT INTO Sales.Customer(CustomerName,CustomerType) VALUES(‘Håkan Winther’, 1),(‘SQL Service’, 2)

— First select
SELECT * FROM Sales.Customer WHERE CustomerName=‘Håkan Winther’

DROP INDEX ix_Customer_AK ON Sales.Customer;
GO

CREATE NONCLUSTERED INDEX ix_Customer_AK ON Sales.Customer ( CustomerName )INCLUDE(CustomerType) GO

— second select
SELECT ID,CustomerName,CustomerType FROM Sales.Customer WHERE CustomerName=‘Håkan Winther’
— the third select
SELECT ID,CustomerName,CustomerType FROM Sales.Customer WHERE CustomerType=1

 

If you take a look at the actual execution plan you can see something interesting. The first select using select * cannot get all the information from the nonclusterd index and needs to go to the clustered index to get all columns. It can be done in tw ways, a clustered index scan or an index seek + a key lookup. Either way, this will not give you the best performance.

In the second select you’ll get all the columns you need from the nonclustered index as Clustering key (ID) is always included in all non clustered indexes and CustomerType is specified in the include clause. In the execution plan, you’ll find an index seek on the non clustered index. This is what’s called covering index.

As you can see in the third select, you’ll get an index scan in the non clustered index as CustomerType is not one of the index keys, it’s only included at the “page” level.

You might ask yourself, why don’t I use all columns as key columns in the index. Well there are some limitations, like you can’t have more than 16 columns as key columns and you’ll lose performance.

>>Conclusion<<
You need included columns to get the performance when you need to search on columns that is not part of the clustered index, by avoiding clustered index scans and index seek + key lookup.

Well, I hope you learned something from this blog post or that it was crystal clear from the beginning.