Cool Stuff in Snowflake – Part 14: Asynchronous Execution of SQL Statements

I’m doing a little series on some of the nice features/capabilities in Snowflake (the cloud data warehouse). In each part, I’ll highlight something that I think it’s interesting enough to share. It might be some SQL function that I’d really like to be in SQL Server, it might be something else.

It’s been a while since I blogged about Snowflake, but a recent LinkedIn post caught my attention: the ability to add asynchronous execution of SQL statements in a stored procedure. In other words: parallel execution of SQL statements. This got me excited, because in my opinion this is something that has been missing in T-SQL since forever. Every time you want to do something in parallel, you need to use external tools to accomplish this in SQL Server (or Azure SQL DB, or Fabric Warehouse, or Fabric SQL DB, or … you get the point). You needed to use SQL Server Agent Jobs, or SSIS packages, or Azure Data Factory and so on.

Snowflake introduces the ASYNC and AWAIT keywords, which can be used to trigger asynchronous execution. People familiar with programming languages such as C# are probably pretty familiar with these constructs, but I guess most SQL people might need a short demo. Suppose I create the following table in Snowflake:

CREATE TABLE dbo.ParallelTest(
     id INT NOT NULL AUTOINCREMENT
    ,message VARCHAR(500) NOT NULL
    ,insertDate TIMESTAMP_NTZ NOT NULL DEFAULT CURRENT_TIMESTAMP
);

We can insert data into this table using an INSERT statement like this:

insert into dbo.ParallelTest(message)
values('This is an initial test!');

We can now create a stored procedure that will insert three rows in parallel into this table:

CREATE OR REPLACE PROCEDURE dbo.test_parallel_inserts()
RETURNS VARCHAR
LANGUAGE SQL
AS
BEGIN
  ASYNC (INSERT INTO dbo.ParallelTest(message) VALUES('First Insert'));
  ASYNC (INSERT INTO dbo.ParallelTest(message) VALUES('Second Insert'));
  ASYNC (INSERT INTO dbo.ParallelTest(message) VALUES('Third Insert'));
  AWAIT ALL;
END;

They key here is to specify ASYNC for each statement that needs to be executed asynchronously. We can use AWAIT for each statement individually, or you can use AWAIT ALL. Let’s run the stored procedure:

CALL dbo.test_parallel_inserts();

When we retrieve the results, we can see that the insert timestamps are nearly the same:

The values for the autoincrement column are a bit wonky because the identity values are not guaranteed to be in increasing order. If you don’t specify ORDER or NOORDER at table creation time, the session parameter NOORDER_SEQUENCE_AS_DEFAULT determines the ordering and the default is NOORDER.

Obviously, introducing a feature like this comes with risk. Even without parallel execution of statements there are common issues in databases like locking, deadlocking and unpredictable results due to poor coding, and adding such a feature might make it worse if not treaded carefully. But in my opinion, a feature like this can be quite useful in an environment where the execution of SQL code is heavily regulated, such as the ETL for a data warehouse. One might ask themselves though if parallel execution should be down that deep in the database code, or rather more upstream in a tool such as ADF for example. Food for thought.


------------------------------------------------
Do you like this blog post? You can thank me by buying me a beer 🙂

Koen Verbeeck

Koen Verbeeck is a Microsoft Business Intelligence consultant at AE, helping clients to get insight in their data. Koen has a comprehensive knowledge of the SQL Server BI stack, with a particular love for Integration Services. He's also a speaker at various conferences.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.