Categories: Azure Data Factory

Azure Data Factory and the Exact Online REST API – Dealing with Pagination

Some endpoints don’t have that much data, both others certainly will, for example financial transactions. The REST API limits the number of records it returns, and for most endpoints this is 60 records. After the first 60 records, you get a link at the bottom of the response that points to the next 60 records. This concept is called pagination and is common with REST APIs. There are some endpoints in the Exact Online REST API which are also available as a bulk endpoint. Those return 1,000 records instead of 60, so if you can use them, it would be in your best interest to do so.

Lucky for us, ADF has built-in pagination support. At the end of the JSON response, some indicator to the next set of records will be provided. You only have to tell ADF what type of pointer it is. In the case of the Exact Online, it’s an absolute URL.

This absolute URL is returned in the “__next” child node. When you open up the source tab of the Copy Activty, you can configure this in the pagination rules:

The “d” node is the root node of the JSON file. Then you have “results” (which you should’ve seen in the mapping), and “__next” is a final node that is only present when there’s pagination. The “results” also has a “__metadata” node that you might want to skip in the mapping pane.

And that’s it. ADF will take care of the pagination for you. Every time it finds a “__next” node at the end of the response, it will make another call to the REST API until the node is absent. So there’s no harm in configurating pagination for an endpoint of which you know it currently doesn’t use pagination. ADF will only make one call and immediately realize it doesn’t need to make more calls. With pagination, ADF can make many calls for a specific endpoint and certainly to the whole REST API when you’re running multiple pipelines in parallel. And guess what, the number of API calls per minute is limited as well. For example, you can only do 60 calls per company, per minute. You can limit the number of calls by using a bulk endpoint if possible. ADF has to do less calls to get the full dataset. Another option is toying with the request interval property.

By default, ADF will wait for 10ms between two requests to the API. If you set this to 1 second for example, then ADF can only make 60 calls maximum per minute.

One problem that ADF seems to have is that when there are multiple pages, ADF just appends the different responses into one file. This leads to an invalid JSON file:

I might’ve configured something wrong, but when I changed to a CSV file there weren’t any problems, so I just kept on using the CSV file format instead of trying to figure out if I could fix the JSON.

Related Posts

  1. Introduction
  2. Fetching your first refresh/access tokens
  3. Getting a new access token from ADF
  4. Retrieving the Divisions
  5. Retrieving Data from an Endpoint
  6. Getting around the 10 Minute Limit

------------------------------------------------
Do you like this blog post? You can thank me by buying me a beer 🙂
Koen Verbeeck

Koen Verbeeck is a Microsoft Business Intelligence consultant at AE, helping clients to get insight in their data. Koen has a comprehensive knowledge of the SQL Server BI stack, with a particular love for Integration Services. He's also a speaker at various conferences.

View Comments

  • "One problem that ADF seems to have is that when there are multiple pages, ADF just appends the different responses into one file. This leads to an invalid JSON file"

    I have exact the same problem. Did you found a solution (other than CSV) after you wrote this blogpost?

    • Hi Ralph,

      I didn't look further. One option I might think of is after you've written the data to the (invalid) JSON file, you pre-process it with an Azure Function and you remove the incorrect bits. I wasn't in the mood to write such a function...

      Koen

      • Ah just found the solution. In the copy acitvity, select the "Sink" tab, change the "File pattern" to "Array of objects".
        If you do that, a new array is created and every page of your API (the response) is then an object in that new array (seperated with a comma as it should be).
        So simple fix. Took me a while to find it though :-)

  • Hello
    How do you use pagination for SOAP APis in ADF
    I would appreciate any recommendations.
    Thank you.

  • It seems like the Pagination Rules section is updated in ADF. If you want to do the same as Koen did in this blogpost you can select AbsoluteUrl from the dropdown under Name and leave the input field blank, under value select None from the dropdown and put $.d.__next in the input field.

    • Thank you for your reply
      I have done something similar for REST APIS but I am having problems with SOAP APIS since I get a resultId which is different for each page so in order to get the next page I need to include the resultId to the body call to get next page.
      Is there any way to accomplish this in ADF

      • Maybe it could work if you store the resultId into a variable after a SOAP API call and then use it in the next iteration?

        • How can I do that?
          Sorry for asking many questions, I'm new to ADF and been struggling with this for a while.
          Thank you.

          • If you have a variable for the content of the body and for the result id, you could loop with a while (as long as a result id is returned) and every loop you put the result id you received into your body variable.

  • We have added pagination but its not iterating only 500 records are coming . Is there any other ways to iterate the copy activity ?

    • In the response, is there anything related to pagination? For example a page number, or a link to the next page?
      Is there a possibility to set an offset or something like that?

      • Hi Koen,

        Thanks for the reply . We found the solution . Needs to define without $ in pagination rules

        pageDetails.nextPageURL before we have defined as $.pageDetails.nextPageUrl

Recent Posts

Book Review – Agile Data Warehouse Design

I recently read the book Agile Data Warehouse Design - Collaborative Dimensional Modeling, from Whiteboard…

1 month ago

Cloudbrew 2024 – Slides

You can find the slides for the session Building the €100 data warehouse with the…

1 month ago

Book Review – Microsoft Power BI Performance Best Practices

I was asked to do a review of the book Microsoft Power BI Performance Best…

2 months ago

Create a Numbers Table in Power Query

This is a quick blog post, mainly so I have the code available if I…

2 months ago

Microsoft finally adds Tenant Switcher for Fabric / Power BI

Praise whatever deity you believe in, because it's finally here, a tenant switcher for Microsoft…

2 months ago

Book Review – Humanizing Data Strategy by Tiankai Feng

This book was making its rounds on social media, and the concept seems interesting enough…

2 months ago