… with this one weird little trick 🙂 Sorry for the clickbait, but you can actually save a lot on ADF pricing if you design your pipelines carefully. And as we will see, don’t do everything in ADF. Before we proceed further, I advise to watch this short YouTube video by Cathrine Wilhelmsen on ADF cost. It explains the basics very well and demonstrates the concepts of the pricing with a few examples:
As you might’ve noticed, pricing in ADF is not the same as it was in SSIS for example. In SSIS, you pay your SQL Server license and you’re done (well, and you buy a server to run it on). It doesn’t matter what you do with SSIS, the cost is the same. If you run 1 package or 1000 packages, there’s no difference except in your electricity bill. However, in ADF you pay more if you use it more. You pay for each action you do, you pay for each activity you use and for how long things are running. There are a couple of guidelines you can follow to try to minimize costs:
You might’ve noticed the most expensive part in ADF is the Copy Activity. Its cost depends on the following formula:
If you have a metadata-driven Copy Activity inside a For Each loop, things can get expensive fast. For example, I have a pipeline that extracts data from a REST API. I need to extract 100 endpoints, which means I get 100 minutes of copy duration. Using the formula and price points from the video, this means an execution cost of approximately $0.83 per pipeline run. This doesn’t seem much, but this is $25 for a whole month (one execution per day). For a single pipeline. If you have multiple pipelines like this one, you can see why it can get expensive. Or if you increase the number of executions. Running the pipeline every hour is about $600 a month. For a single pipeline. Executing it every 5 minutes is about $7200. Whoops. Do not use ADF as a streaming tool. Luckily there’s still a part of the formula we can influence: the number of used DIUs (Data Integration Units). Cathrine also mentions this in the video. The default setting is Auto, which scales automatically but starts at 4.
However, you can set it manually to 2.
This means you can already save 50% on your Copy Activity cost! Obviously, this affects performance and you should only do this when working with smaller data sets. But if you just have to load some smaller files or some REST API data, definitely set it to 2.
How can we cut costs even more? By avoiding the Copy Activity altogether. In my use case of the REST API, I wrote an Azure Function that extracts data from the API and dumps it into a staging table using SqlBulkCopy. This means an external activity is used, which is much cheaper. By using the Azure Function, I reduced the price of my pipeline to $0.05 per month (for a daily run). Even if I would run the pipeline hourly, it would cost me a mere $1.09 in ADF. You have to pay for the Azure Function as well of course, but if you use a consumption plan the first million (!) executions and 400,000 GB/s are free (per subscription). So it’s possible you’re staying in the free part of the offering, which means $0 cost.
Even if you go over the free offering, the costs are cheaper than ADF. Keep in mind though, the development effort is higher when using Azure Functions because you’ll have to write the code yourself, while the Copy Activity is configured in a couple of minutes.
I recently read the book Agile Data Warehouse Design - Collaborative Dimensional Modeling, from Whiteboard…
You can find the slides for the session Building the €100 data warehouse with the…
I was asked to do a review of the book Microsoft Power BI Performance Best…
This is a quick blog post, mainly so I have the code available if I…
Praise whatever deity you believe in, because it's finally here, a tenant switcher for Microsoft…
This book was making its rounds on social media, and the concept seems interesting enough…
View Comments
How Azure Data Factory Pricing Works (31 Dec 2020)
https://www.youtube.com/watch?v=n4kJO72yb2w
Azure Data Factory, Pipeline, settings, Data integration unit: select 2 (default is auto)
(remove unused pipelines)
Data Pipeline pricing (Azure Data Factory):
https://azure.microsoft.com/en-gb/pricing/details/data-factory/data-pipeline/
Azure Data Factory
https://azure.microsoft.com/en-gb/products/data-factory/
Azure pricing:
https://azure.microsoft.com/en-us/pricing/