Categories: Azure Data Factory

Azure Data Factory and the Exact Online REST API – Dealing with Pagination

Some endpoints don’t have that much data, both others certainly will, for example financial transactions. The REST API limits the number of records it returns, and for most endpoints this is 60 records. After the first 60 records, you get a link at the bottom of the response that points to the next 60 records. This concept is called pagination and is common with REST APIs. There are some endpoints in the Exact Online REST API which are also available as a bulk endpoint. Those return 1,000 records instead of 60, so if you can use them, it would be in your best interest to do so.

Lucky for us, ADF has built-in pagination support. At the end of the JSON response, some indicator to the next set of records will be provided. You only have to tell ADF what type of pointer it is. In the case of the Exact Online, it’s an absolute URL.

This absolute URL is returned in the “__next” child node. When you open up the source tab of the Copy Activty, you can configure this in the pagination rules:

The “d” node is the root node of the JSON file. Then you have “results” (which you should’ve seen in the mapping), and “__next” is a final node that is only present when there’s pagination. The “results” also has a “__metadata” node that you might want to skip in the mapping pane.

And that’s it. ADF will take care of the pagination for you. Every time it finds a “__next” node at the end of the response, it will make another call to the REST API until the node is absent. So there’s no harm in configurating pagination for an endpoint of which you know it currently doesn’t use pagination. ADF will only make one call and immediately realize it doesn’t need to make more calls. With pagination, ADF can make many calls for a specific endpoint and certainly to the whole REST API when you’re running multiple pipelines in parallel. And guess what, the number of API calls per minute is limited as well. For example, you can only do 60 calls per company, per minute. You can limit the number of calls by using a bulk endpoint if possible. ADF has to do less calls to get the full dataset. Another option is toying with the request interval property.

By default, ADF will wait for 10ms between two requests to the API. If you set this to 1 second for example, then ADF can only make 60 calls maximum per minute.

One problem that ADF seems to have is that when there are multiple pages, ADF just appends the different responses into one file. This leads to an invalid JSON file:

I might’ve configured something wrong, but when I changed to a CSV file there weren’t any problems, so I just kept on using the CSV file format instead of trying to figure out if I could fix the JSON.

Related Posts

------------------------------------------------
Do you like this blog post? You can thank me by buying me a beer 🙂

Koen Verbeeck

Koen Verbeeck is a Microsoft Business Intelligence consultant at AE, helping clients to get insight in their data. Koen has a comprehensive knowledge of the SQL Server BI stack, with a particular love for Integration Services. He's also a speaker at various conferences.

Next Azure Data Factory and the Exact Online REST API – Getting around the 10 Minute Limit »

Previous « Azure Data Factory and the Exact Online REST API – Retrieving Data from an Endpoint

View Comments

Ralph says:

September 8, 2021 at 5:26 pm

"One problem that ADF seems to have is that when there are multiple pages, ADF just appends the different responses into one file. This leads to an invalid JSON file"

I have exact the same problem. Did you found a solution (other than CSV) after you wrote this blogpost?
- Koen Verbeeck says:
  
  September 9, 2021 at 8:12 am
  
  Hi Ralph,
  
  I didn't look further. One option I might think of is after you've written the data to the (invalid) JSON file, you pre-process it with an Azure Function and you remove the incorrect bits. I wasn't in the mood to write such a function...
  
  Koen
  - Ralph says:
    
    September 9, 2021 at 1:12 pm
    
    Ah just found the solution. In the copy acitvity, select the "Sink" tab, change the "File pattern" to "Array of objects".
    If you do that, a new array is created and every page of your API (the response) is then an object in that new array (seperated with a comma as it should be).
    So simple fix. Took me a while to find it though :-)
    - Koen Verbeeck says:
      
      September 9, 2021 at 7:10 pm
      
      Alright, good to know. Thanks for posting back!
Erald Dervishi says:

December 29, 2021 at 7:50 pm

Hello
How do you use pagination for SOAP APis in ADF
I would appreciate any recommendations.
Thank you.
- Koen Verbeeck says:
  
  December 30, 2021 at 10:48 am
  
  Hi Erald,
  SOAP is currently not directly supported, but you can use the HTTP connector:
  https://www.how2code.info/en/blog/how-to-connect-to-a-soap-api-from-azure-data-factory/
  However, this connector doesn't have built in paging, so you'll need to implement it yourself.
  
  Regards,
  Koen
  - Erald Dervishi says:
    
    December 30, 2021 at 3:05 pm
    
    Thank you for your reply,
    I have read this article suggested by you, and I am able to get data using Web activity, the only issue I am having is how to use pagination, during my research I didn't find anything helpful and since I am new to ADF not sure how to implement in Azure for SOAP APIS.
Johan van den Brink says:

January 19, 2022 at 12:12 pm

It seems like the Pagination Rules section is updated in ADF. If you want to do the same as Koen did in this blogpost you can select AbsoluteUrl from the dropdown under Name and leave the input field blank, under value select None from the dropdown and put $.d.__next in the input field.
- Erald Dervishi says:
  
  January 19, 2022 at 3:01 pm
  
  Thank you for your reply
  I have done something similar for REST APIS but I am having problems with SOAP APIS since I get a resultId which is different for each page so in order to get the next page I need to include the resultId to the body call to get next page.
  Is there any way to accomplish this in ADF
  - Johan van den Brink says:
    
    January 21, 2022 at 2:32 pm
    
    Maybe it could work if you store the resultId into a variable after a SOAP API call and then use it in the next iteration?
    - Erald Dervishi says:
      
      January 22, 2022 at 3:34 pm
      
      How can I do that?
      Sorry for asking many questions, I'm new to ADF and been struggling with this for a while.
      Thank you.
      - Johan van den Brink says:
        
        January 25, 2022 at 12:31 pm
        
        If you have a variable for the content of the body and for the result id, you could loop with a while (as long as a result id is returned) and every loop you put the result id you received into your body variable.
Selva says:

April 5, 2022 at 12:32 pm

We have added pagination but its not iterating only 500 records are coming . Is there any other ways to iterate the copy activity ?
- Koen Verbeeck says:
  
  April 5, 2022 at 2:56 pm
  
  In the response, is there anything related to pagination? For example a page number, or a link to the next page?
  Is there a possibility to set an offset or something like that?
  - selvackp says:
    
    April 6, 2022 at 4:12 am
    
    Hi Koen,
    
    Thanks for the reply . We found the solution . Needs to define without $ in pagination rules
    
    pageDetails.nextPageURL before we have defined as $.pageDetails.nextPageUrl
    - Koen Verbeeck says:
      
      April 6, 2022 at 10:04 am
      
      Great! Glad you found the solution and thanks for posting back.