Before we create pipelines to fetch data from the REST API, we need to create a helper pipeline that will fetch a new access token. This pipeline has the following format:
We need two linked services: one for the Azure SQL DB and one for the REST API. The latter takes the following configuration:
Make sure to set the Authentication Type to Anonymous. We will pass tokens to authenticate ourselves. The Base URL – or the start URL as Exact Online calls it – depends on what country you’re in. You can find a list here. Disable server certificate validation. If you want to access the XML API, you need an HTTP linked service and its configuration is exactly the same as the REST linked service. You cannot retrieve XML data from the REST API, as the REST connector in ADF only supports JSON.
The Azure SQL DB linked service looks like this:
There are different methods to authenticate to the database. In my case I’ve chosen for Managed Identity, which means ADF will try to connect using its own object ID. To make this succeed, we need to add ADF as a user to the database. You can use the following script:
CREATE USER [adf_name] FROM EXTERNAL PROVIDER; -- where adf_name is the name of your ADF instance ALTER ROLE [db_datareader] ADD MEMBER [deparco-adf-dev]; ALTER ROLE [db_datawriter] ADD MEMBER [deparco-adf-dev]; ALTER ROLE [db_ddladmin] ADD MEMBER [deparco-adf-dev];
In the script the ADF account is given read/write access and also the ability to execute DDL statements (needed to create tables or truncate them). Once the linked services are created, we need a SQL dataset for the refresh token:
First step of the pipeline is to retrieve the old refresh token from the database using the linked service and dataset we just created:
Then we store the refresh token in a variable:
The expression: @activity(‘Get Last Refresh Token’).output.firstRow.RefreshToken
Next we use a Web Activity to request the new tokens:
Both the refresh token and access token are saved to variables. The expressions:
Finally both tokens are written to the database so they can easily be retrieved by other pipelines. The reason we’re using a database (or any other storage) in ADF a child pipeline cannot return output values to the parent pipeline, so we’re persisting everything. To do this, we use the Stored Procedure Activity and some dynamic SQL:
Set sp_execute_sql as the stored proc name and use the following expression for the stmt parameter:
@concat('TRUNCATE TABLE etl.RefreshToken; TRUNCATE TABLE etl.AccessToken; INSERT INTO etl.RefreshToken([RefreshToken]) VALUES(''',variables('refreshtoken_new'),'''); INSERT INTO etl.AccessToken([AccessToken]) VALUES(''',variables('accesstoken'),''');')
Both the table for the access token and the refresh token are truncated and the new values are inserted. They can now be retrieved from other pipelines.
Related Posts
I recently read the book Agile Data Warehouse Design - Collaborative Dimensional Modeling, from Whiteboard…
You can find the slides for the session Building the €100 data warehouse with the…
I was asked to do a review of the book Microsoft Power BI Performance Best…
This is a quick blog post, mainly so I have the code available if I…
Praise whatever deity you believe in, because it's finally here, a tenant switcher for Microsoft…
This book was making its rounds on social media, and the concept seems interesting enough…
View Comments
Another option would be to put and get the refresh token in Key Vault
Sure, any secure storage will do.
This is incredible. Really appreciate the detailed breakdown of the steps.
One question I have is: how do we grab the DB stored access_token to then authenticate the Linked Service in ADF? And set up a mechanism so that it's automated.
You don't need to authenticate the linked service. You only need the access token when you're doing a REST/XML API call in a pipeline. You can find an example here: https://sqlkover.com/azure-data-factory-and-the-exact-online-rest-api-retrieving-the-divisions/
Hello Koen,
First of all thank you for the article and detailed explanation.
I have some question about refresh token. Eveytime we run this pipeline, it will get new access token ( which expires in 10 min) and does it also retrive new refresh token (accoridng to doc it expires in 30 days)
When you request a new access token you will also get a new refresh token. You have to use this new refresh token the next time you request a new access token. So even though a request token is valid for 30 days, it will most likely be refreshed more often than that.
Keep in mind that an access token is valid for 10 min, but you can only request a new one after 8.5 min have passed (not so sure about this interval, but you can find it in the docs). This means that refreshing it in every pipeline is probably not a good idea.