Categories: Varia

Cool Stuff in Snowflake – Part 2: LISTAGG

I’m starting a little series on some of the nice features/capabilities in Snowflake (the cloud data warehouse). In each part, I’ll highlight something that I think it’s interesting enough to share. It might be some SQL function that I’d really like to be in SQL Server, it might be something else.

This blog post talks about the LISTAGG function. The goal of the function is to concatenate values from a column into a delimited list. Let’s take a look at an example. I have a simple table with one column and 250 rows, containing a list of the top 250 movies from IMDB. If I want one single row with all the movie titles concatenated, I can issue the following SQL statement:

select LISTAGG(MOVIES,', ') from STAGE.TEST

If you want to group by another column, you can add the WITHIN GROUP clause. For example, return a list of concatenated employee names per department.

Since SQL Server 2017, you have the STRING_AGG function, which has almost the exact same syntax as its Snowflake counterpart. There are two minor differences:

  • Snowflake has an optional DISTINCT
  • SQL Server has a default ascending sorting. If you want another sorting, you can specify one in the WITHIN GROUP clause. In Snowflake, there is no guaranteed sorting unless you specify it (again in the WITHIN GROUP clause).

If you are working on a version of SQL Server before 2017, you’ll appreciate the simplicity of the LISTAGG/STRINGAGG function, since you have to resort to some hacks to get the job done. My favorite article which lists a lot of potential solutions is Concatenating Row Values in Transact-SQL. My favorite method is the “black-box XML” method (I still had to look up the syntax every time). It’s ugly, but quite fast. The STUFF function is used to remove the trailing comma.

SELECT
     e1.DepartmentID
    ,Employees = 
       STUFF(
       (  SELECT EmployeeName + ',' 
          FROM dbo.Employees e2
          WHERE e2.DepartmentID = e1.DepartmentID
          ORDER BY EmployeeName
          FOR XML PATH(''), TYPE).value('.', 'VARCHAR(MAX)')
       ),1,1,'')
FROM dbo.Employees e1
GROUP BY DepartmentID;

Yep, using LISTAGG is much easier. The previous example can be rewritten as:

SELECT
     DepartmentID
    ,LISTAGG(DISTINCT EmployeeName) WITHIN GROUP (ORDER BY EmployeeName) AS Employees
FROM dbo.Employees
GROUP BY DepartmentID;

Other parts in this series:


------------------------------------------------
Do you like this blog post? You can thank me by buying me a beer 🙂
Koen Verbeeck

Koen Verbeeck is a Microsoft Business Intelligence consultant at AE, helping clients to get insight in their data. Koen has a comprehensive knowledge of the SQL Server BI stack, with a particular love for Integration Services. He's also a speaker at various conferences.

View Comments

  • Thanks for these, Koen. These posts are great. I'm new to Snowflake, and it's really helpful to get some new ideas presented in a way I can connect to my SQL Server background. I've only made it through the first three post so far, but I'm finding them very interesting!

    As a side note, if the ONLY thing you need back from SQL Server was the single string of the names, you could also get to it by populating a variable:

    DECLARE @NameList VARCHAR(MAX) = '';
    SELECT @NameList = @NameList + EmployeeName + ',' FROM dbo.Employees ORDER BY EmployeeName;
    SELECT SUBSTRING(@NameList, 1, LEN(@NameList)-1) -- The substring is just to remove the trailing comma

    Or, rather than removing the trailing comma, you could avoid it altogether by placing it ahead of each name with a CASE statement:
    DECLARE @NameList VARCHAR(MAX) = '';
    SELECT @NameList = @NameList + CASE @NameList WHEN '' THEN '' ELSE ',' END + EmployeeName FROM dbo.Employees ORDER BY EmployeeName;
    SELECT @NameList;

  • LISTAGG is also a nice way to get around issues sending successive rows to a UDF/UDTF - just LISTAGG the rows and send with the partitioned key.

Recent Posts

Free webinar – Tackling the Gaps and Islands Problem with T-SQL Window Functions

I'm hosting a free webinar at MSSQLTips.com at the 19th of December 2024, 6PM UTC.…

6 days ago

dataMinds Connect 2024 – Session Materials

The slides and scripts for my session "Tackling the Gaps & Islands Problem with T-SQL…

4 weeks ago

Connect to Power BI as a Guest User in another Tenant

Sometimes your Microsoft Entra ID account (formerly known as Azure Active Directory) is added as…

2 months ago

How to use a Script Activity in ADF as a Lookup

In Azure Data Factory (ADF, but also Synapse Pipelines and Fabric Pipelines), you have a…

4 months ago

Database Build Error – Incorrect syntax near DISTINCT

I wrote a piece of SQL that had some new T-SQL syntax in it: IS…

4 months ago

Speaking at dataMinds Connect 2024

I'm very excited to announce I've been selected as a speaker for dataMinds Connect 2024,…

5 months ago