A while ago I had a little blog post series about cool stuff in Snowflake. I’m doing a similar series now, but this time for Microsoft Fabric. I’m not going to cover the basics of Fabric, hundreds of bloggers have already done that. I’m going to cover little bits & pieces that I find interesting, that are similar to Snowflake features or something that is an improvement over the “regular” SQL Server or related products.
In this blog post I’m going to talk about shortcuts. They’re a very interesting feature of Fabric allowing you to reference data (either in a table or in a file) without actually copying the data around. Why can this be useful? The architecture of a “modern data platform” nowadays looks something like this (bit of an oversimplification and overdramatization at the same time):
Multiple layers where data is stored and transformed. If you’re unlucky, a data vault with 50 extra layers is thrown in the mix as well. I’ve seen large clients with 7 or more layers in their data platform. Between each layer data is transformed from one format to another or business calculations take place. My point is, there are multiple copies of the same data. With shortcuts in Fabric, you can try to avoid some of those copies.
For example, suppose you have some files sitting in an Azure Data Lake storage container. Instead of copying them over to Fabric and paying twice for the storage, you can just create a shortcut:
If you’re linking to files, you’ll see them pop-up in the files section of the lakehouse (with the little paperclip icon):
Some of those files are quite big to copy over, especially if you have a process that runs each day. Now I can refer to those files (for example in a notebook) without having to duplicate them. In Snowflake, you can create an external stage to link to files outside of Snowflake, and then you can ingest the data using a COPY INTO statement for example.
Another use case is when you link to tables from another lakehouse/warehouse. Suppose another team has created a warehouse and it contains a couple of tables that would be pretty interesting for your analysis. Instead of copying them over (or redoing the whole ETL process with the additional risk of errors so numbers don’t match), you can just link to those tables and use them in your lakehouse as if they are part of it.
This might be extra interesting if you’re building a data mesh: shortcuts can be a way for you to expose your data products.
I’ve written about shortcuts in the article What is OneLake in Microsoft Fabric? but you can also check out the official documentation.
I recently read the book Agile Data Warehouse Design - Collaborative Dimensional Modeling, from Whiteboard…
You can find the slides for the session Building the €100 data warehouse with the…
I was asked to do a review of the book Microsoft Power BI Performance Best…
This is a quick blog post, mainly so I have the code available if I…
Praise whatever deity you believe in, because it's finally here, a tenant switcher for Microsoft…
This book was making its rounds on social media, and the concept seems interesting enough…