A while ago I had a little blog post series about cool stuff in Snowflake. I’m doing a similar series now, but this time for Microsoft Fabric. I’m not going to cover the basics of Fabric, hundreds of bloggers have already done that. I’m going to cover little bits & pieces that I find interesting, that are similar to Snowflake features or something that is an improvement over the “regular” SQL Server or related products.
In this blog post I’m going to talk about shortcuts. They’re a very interesting feature of Fabric allowing you to reference data (either in a table or in a file) without actually copying the data around. Why can this be useful? The architecture of a “modern data platform” nowadays looks something like this (bit of an oversimplification and overdramatization at the same time):
Multiple layers where data is stored and transformed. If you’re unlucky, a data vault with 50 extra layers is thrown in the mix as well. I’ve seen large clients with 7 or more layers in their data platform. Between each layer data is transformed from one format to another or business calculations take place. My point is, there are multiple copies of the same data. With shortcuts in Fabric, you can try to avoid some of those copies.
For example, suppose you have some files sitting in an Azure Data Lake storage container. Instead of copying them over to Fabric and paying twice for the storage, you can just create a shortcut:
If you’re linking to files, you’ll see them pop-up in the files section of the lakehouse (with the little paperclip icon):
Some of those files are quite big to copy over, especially if you have a process that runs each day. Now I can refer to those files (for example in a notebook) without having to duplicate them. In Snowflake, you can create an external stage to link to files outside of Snowflake, and then you can ingest the data using a COPY INTO statement for example.
Another use case is when you link to tables from another lakehouse/warehouse. Suppose another team has created a warehouse and it contains a couple of tables that would be pretty interesting for your analysis. Instead of copying them over (or redoing the whole ETL process with the additional risk of errors so numbers don’t match), you can just link to those tables and use them in your lakehouse as if they are part of it.
This might be extra interesting if you’re building a data mesh: shortcuts can be a way for you to expose your data products.
I’ve written about shortcuts in the article What is OneLake in Microsoft Fabric? but you can also check out the official documentation.
I'm hosting a free webinar at MSSQLTips.com at the 19th of December 2024, 6PM UTC.…
The slides and scripts for my session "Tackling the Gaps & Islands Problem with T-SQL…
Sometimes your Microsoft Entra ID account (formerly known as Azure Active Directory) is added as…
In Azure Data Factory (ADF, but also Synapse Pipelines and Fabric Pipelines), you have a…
I wrote a piece of SQL that had some new T-SQL syntax in it: IS…
I'm very excited to announce I've been selected as a speaker for dataMinds Connect 2024,…