Some endpoints don’t have that much data, both others certainly will, for example financial transactions. The REST API limits the number of records it returns, and for most endpoints this is 60 records. After the first 60 records, you get a link at the bottom of the response that points to the next 60 records. This concept is called pagination and is common with REST APIs. There are some endpoints in the Exact Online REST API which are also available as a bulk endpoint. Those return 1,000 records instead of 60, so if you can use them, it would be in your best interest to do so.
Lucky for us, ADF has built-in pagination support. At the end of the JSON response, some indicator to the next set of records will be provided. You only have to tell ADF what type of pointer it is. In the case of the Exact Online, it’s an absolute URL.
This absolute URL is returned in the “__next” child node. When you open up the source tab of the Copy Activty, you can configure this in the pagination rules:
The “d” node is the root node of the JSON file. Then you have “results” (which you should’ve seen in the mapping), and “__next” is a final node that is only present when there’s pagination. The “results” also has a “__metadata” node that you might want to skip in the mapping pane.
And that’s it. ADF will take care of the pagination for you. Every time it finds a “__next” node at the end of the response, it will make another call to the REST API until the node is absent. So there’s no harm in configurating pagination for an endpoint of which you know it currently doesn’t use pagination. ADF will only make one call and immediately realize it doesn’t need to make more calls. With pagination, ADF can make many calls for a specific endpoint and certainly to the whole REST API when you’re running multiple pipelines in parallel. And guess what, the number of API calls per minute is limited as well. For example, you can only do 60 calls per company, per minute. You can limit the number of calls by using a bulk endpoint if possible. ADF has to do less calls to get the full dataset. Another option is toying with the request interval property.
By default, ADF will wait for 10ms between two requests to the API. If you set this to 1 second for example, then ADF can only make 60 calls maximum per minute.
One problem that ADF seems to have is that when there are multiple pages, ADF just appends the different responses into one file. This leads to an invalid JSON file:
I might’ve configured something wrong, but when I changed to a CSV file there weren’t any problems, so I just kept on using the CSV file format instead of trying to figure out if I could fix the JSON.
Related Posts
I recently read the book Agile Data Warehouse Design - Collaborative Dimensional Modeling, from Whiteboard…
You can find the slides for the session Building the €100 data warehouse with the…
I was asked to do a review of the book Microsoft Power BI Performance Best…
This is a quick blog post, mainly so I have the code available if I…
Praise whatever deity you believe in, because it's finally here, a tenant switcher for Microsoft…
This book was making its rounds on social media, and the concept seems interesting enough…
View Comments
"One problem that ADF seems to have is that when there are multiple pages, ADF just appends the different responses into one file. This leads to an invalid JSON file"
I have exact the same problem. Did you found a solution (other than CSV) after you wrote this blogpost?
Hi Ralph,
I didn't look further. One option I might think of is after you've written the data to the (invalid) JSON file, you pre-process it with an Azure Function and you remove the incorrect bits. I wasn't in the mood to write such a function...
Koen
Ah just found the solution. In the copy acitvity, select the "Sink" tab, change the "File pattern" to "Array of objects".
If you do that, a new array is created and every page of your API (the response) is then an object in that new array (seperated with a comma as it should be).
So simple fix. Took me a while to find it though :-)
Alright, good to know. Thanks for posting back!
Hello
How do you use pagination for SOAP APis in ADF
I would appreciate any recommendations.
Thank you.
Hi Erald,
SOAP is currently not directly supported, but you can use the HTTP connector:
https://www.how2code.info/en/blog/how-to-connect-to-a-soap-api-from-azure-data-factory/
However, this connector doesn't have built in paging, so you'll need to implement it yourself.
Regards,
Koen
Thank you for your reply,
I have read this article suggested by you, and I am able to get data using Web activity, the only issue I am having is how to use pagination, during my research I didn't find anything helpful and since I am new to ADF not sure how to implement in Azure for SOAP APIS.
It seems like the Pagination Rules section is updated in ADF. If you want to do the same as Koen did in this blogpost you can select AbsoluteUrl from the dropdown under Name and leave the input field blank, under value select None from the dropdown and put $.d.__next in the input field.
Thank you for your reply
I have done something similar for REST APIS but I am having problems with SOAP APIS since I get a resultId which is different for each page so in order to get the next page I need to include the resultId to the body call to get next page.
Is there any way to accomplish this in ADF
Maybe it could work if you store the resultId into a variable after a SOAP API call and then use it in the next iteration?
How can I do that?
Sorry for asking many questions, I'm new to ADF and been struggling with this for a while.
Thank you.
If you have a variable for the content of the body and for the result id, you could loop with a while (as long as a result id is returned) and every loop you put the result id you received into your body variable.
We have added pagination but its not iterating only 500 records are coming . Is there any other ways to iterate the copy activity ?
In the response, is there anything related to pagination? For example a page number, or a link to the next page?
Is there a possibility to set an offset or something like that?
Hi Koen,
Thanks for the reply . We found the solution . Needs to define without $ in pagination rules
pageDetails.nextPageURL before we have defined as $.pageDetails.nextPageUrl
Great! Glad you found the solution and thanks for posting back.