Categories: TSQL

Nesting Aggregates with Window Functions

Recenty I was writing an article for MSSQLTips where I had to create a treemap (it will be published soon). As sample data, I used the different folders containing the drafts for all the tips I ever wrote. As measures, I have the number of kilobytes and the number of items per folder. An example:

mixingagg01mixingagg01

As you can see, the measures are easy to calculate using the GROUP BY clause and the standard aggregate functions SUM and COUNT. Nothing special here.
Now I want to calculate the highest number of items I can find in a folder, but without resorting to subqueries or CTEs. I want to keep the query clean. Window functions can easily do the trick of course. Using an empty OVER clause, you can find the highest number of the total set. This type of syntax can also be used to find the grand total or example. The syntax should be valid from SQL Server 2005.

mixingagg02mixingagg02

And now we have something special: nested aggregates. We have the COUNT aggregate, which belongs to the GROUP BY and we have the MAX aggregate which is a window function. The reason why we can reference the result of the COUNT function inside the window functions is because window functions are calculated after the GROUP BY. The query is in a way equivalent to this:

SELECT
	 [Year]
	,[Tip]
	,Size		= SUM([Size])		
	,NumberOfItems	= COUNT(1)
	,MaxItems	= MAX(NumberOfItems) OVER ()
FROM [dbo].[FileSizes]
GROUP BY [Year],[Tip];

This syntax is not valid of course, since you cannot reference an alias in the same SELECT statement, but you get the point. It’s possible to nest aggregates into each other, but it can lead to confusing TSQL, such as SUM(SUM(myColumn) OVER (). On the other hand it leads to concise code. If you nest aggregates, maybe add some comments to the code to clarify for the person coming after you.

For more information, read the book Microsoft SQL Server 2012 High-Performance T-SQL Using Window Functions by Itzik Ben-Gan. It’s a great book, not too long and it explains in great detail how window functions work. It also explains the order in which a query is evaluated by SQL Server.


------------------------------------------------
Do you like this blog post? You can thank me by buying me a beer 🙂
Koen Verbeeck

Koen Verbeeck is a Microsoft Business Intelligence consultant at AE, helping clients to get insight in their data. Koen has a comprehensive knowledge of the SQL Server BI stack, with a particular love for Integration Services. He's also a speaker at various conferences.

Recent Posts

Execute Fabric Data Pipeline from Azure Data Factory

In the blog post Call a Fabric REST API from Azure Data Factory I explained…

1 week ago

Azure Data Factory Pipeline Debugging Fails with BadRequest

I recently had a new pipeline fail. It was actually a copy of an old…

1 month ago

Call a Fabric REST API from Azure Data Factory

Suppose you want to call a certain Microsoft Fabric REST API endpoint from Azure Data…

1 month ago

Cool Stuff in Snowflake – Part 14: Asynchronous Execution of SQL Statements

I’m doing a little series on some of the nice features/capabilities in Snowflake (the cloud data warehouse).…

2 months ago

How I passed the DP-700 Exam

I recently took and passed the DP-700 exam, which is required for the Microsoft Certified:…

2 months ago

Take over Ownership in Microsoft Fabric

When you create an item in Microsoft Fabric (a notebook, a lakehouse, a warehouse, a…

3 months ago