SQL Server – Windowing functions and Aggregates

Oopps! Upgrade your browser pretty please.Oopps! Upgrade your browser pretty please.

Windowing functions i SQL Server is a valuable feature if you want to aggregate data for specific rows within a dataset without loosing the details of the dataset. If you use Group by to aggregate the data, you loose the details, but with windowing functions you still get all the rows in the dataset. This is very useful when you want to calculated running totals and sliding average, for example when you want to see all customer orders and a running total for all of the orders in sequence like this:

[code lang=”SQL”]
SELECT
dd.[FiscalYear]
,dd.[EnglishMonthName]
,[ResellerKey]
,CumulativeTotal = SUM([frs].[SalesAmount]) OVER(PARTITION BY [ResellerKey],[FiscalYear],[EnglishMonthName] ORDER BY [FiscalYear],[EnglishMonthName])
FROM [AdventureWorksDW2008R2].[dbo].[FactResellerSales] frs
INNER JOIN [AdventureWorksDW2008R2].dbo.[DimDate] dd
ON frs.[OrderDateKey] = dd.[DateKey]
ORDER BY [frs].[ResellerKey], dd.[FiscalYear], dd.[EnglishMonthName]
[/code]

In this example I list all the orders for a reseller and calculates the running total for the reseller and I am only interested in which year and month, but the data includes days in the date dimension. The day dimension makes the reseller, year and month to be repeated and the cumulative total is duplicated for every row, but if I only want the cumulative value for each month, how do I accomplish this?

Well the obvious answer is group the data on reseller, year and month, but there is a minor problem with the grouping. Try to add the group by clause to the query like this:

[code lang=”SQL”]
SELECT
dd.[FiscalYear]
,dd.[EnglishMonthName]
,[ResellerKey]
,TotalSales = SUM([SalesAmount])
,CumulativeTotal = SUM([frs].[SalesAmount]) OVER(PARTITION BY [ResellerKey],[FiscalYear],[EnglishMonthName] ORDER BY [FiscalYear],[EnglishMonthName])
FROM [AdventureWorksDW2008R2].[dbo].[FactResellerSales] frs
INNER JOIN [AdventureWorksDW2008R2].dbo.[DimDate] dd
ON frs.[OrderDateKey] = dd.[DateKey]
GROUP BY dd.[FiscalYear], dd.[EnglishMonthName], [ResellerKey]
ORDER BY [frs].[ResellerKey], dd.[FiscalYear], dd.[EnglishMonthName]
[/code]

If you execute this, you will get the error “Msg 8120, Level 16, State 1, Line 20 Column ‘AdventureWorksDW2008R2.dbo.FactResellerSales.SalesAmount’ is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.” because the column SalesAmount is not in the select list.

To solve this you could try to first aggregate the data and then calculate the cumulative total by using a common table expression like this:

[code lang=”SQL”]
WITH cte
AS
(
SELECT
dd.[FiscalYear]
,dd.[EnglishMonthName]
,[ResellerKey]
,TotalSales = SUM([SalesAmount])
FROM [AdventureWorksDW2008R2].[dbo].[FactResellerSales] frs
INNER JOIN [AdventureWorksDW2008R2].dbo.[DimDate] dd
ON frs.[OrderDateKey] = dd.[DateKey]
GROUP BY dd.[FiscalYear], dd.[EnglishMonthName], [ResellerKey]
)
SELECT
*,
CumulativeTotal = SUM([cte].[TotalSales]) OVER(PARTITION BY [cte].[ResellerKey] ORDER BY [cte].[FiscalYear],cte.[EnglishMonthName])
FROM [cte]
ORDER BY [ResellerKey], [FiscalYear], [EnglishMonthName]
[/code]

Well, this works but as you can see the code looks more complicated, but there is a less obvious solution to this. You could in fact stick to the aggregated query without the common table expression, but use the windowing function on the aggregated SalesTotal like this:

[code lang=”SQL”]SELECT
      dd.[FiscalYear]
  ,dd.[EnglishMonthName]
      ,[ResellerKey]
      ,TotalSales = SUM([SalesAmount])
  ,CumulativeTotal = SUM(SUM([frs].[SalesAmount])) OVER(PARTITION BY [ResellerKey] ORDER BY [FiscalYear],[EnglishMonthName])
FROM [AdventureWorksDW2008R2].[dbo].[FactResellerSales] frs
INNER JOIN [AdventureWorksDW2008R2].dbo.[DimDate] dd
 ON frs.[OrderDateKey] = dd.[DateKey]
GROUP BY      
  dd.[FiscalYear]
  ,dd.[EnglishMonthName]
      ,[ResellerKey]
ORDER BY [frs].[ResellerKey], dd.[FiscalYear], dd.[EnglishMonthName]
[/code]

 

 

Notice that I aggregate on the SUM of SalesAmount, and this works very well. When you think about it for a while it all makes sense, the SalesAmount is not part of the dataset and that’s why you can’t use it in the windowing function, but the sum of SalesAmount is part of the dataset.

What about the execution time and execution plan? Well, the last two has the same execution plan and execution time