SQL Server performance – death by computed column

Oopps! Upgrade your browser pretty please. Oopps! Upgrade your browser pretty please.

Computed columns

In SQL Server you can add tables with columns that are computed based on other columns in the same table or computed based on deterministic functions. I guess many developers would say that this kind of business logic has nothing to do in the database, and in some cases they may even be right, but….

… there are some cases that make sense to use a computed column in the database. One of the most obvious is when you need to filter the data based on the result of the computation, for example:

  • when you use json and need to filter based on values in the json document
  • when you want to calculate line total in a “order detail” table and get all discount above a certain limit.

Assume you have a table like this:

CREATE TABLE [Sales].[OrderDetail](
[OrderID] [int] NOT NULL,
[OrderDetailID] [int] IDENTITY(1,1) NOT FOR REPLICATION NOT NULL,
[Qty] [smallint] NOT NULL,
[ProductID] [int] NOT NULL,
[UnitPrice] [money] NOT NULL,
[Discount%] [money] NOT NULL,
[LineTotal] [money] NOT NULL,
[LineDiscount] AS (isnull(([LineTotal]*[Discount%])*[Qty],(0.0))),
CONSTRAINT [PK_OrderDetail_OrderID_OrderDetailID] PRIMARY KEY CLUSTERED
(
[OrderID] ASC,
[OrderDetailID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO

The problem

As you can see in this table, there is a computed column to calculate the LineDiscount. The drawback with this computed column is that it has to be calculated for every row I read from the table. In this case it is a very easy calculation, but I have seen very complex calculations and string manipulations in computed columns and they basically killed the performance of the queries. If I try to write a query like this:

SELECT OrderID FROM Sales.OrderDetail WHERE LineDiscount>=1000

This query will do a clustered index scan, compute the result and then filter the records. This can be a really heavy performance issue, but I can actually add an index on the computed column that will materialize the result of the calculation. This will improve the performance as long as the index is used. Queries that includes the computed column and don’t use the index will get an overhead for every row that has to be calculated.

The solution

The solution to this problem is to create the computed column as “PERSISTED” by adding the keyword PERSISTED after your formula when you create your table or if the table is already created you can alter the column to be persisted like this:

ALTER TABLE Sales.OrderDetail ALTER COLUMN LineDiscount ADD PERSISTED;

You can read more about computed columns at:

https://docs.microsoft.com/en-us/sql/relational-databases/tables/specify-computed-columns-in-a-table