How to avoid runtime errors when fetching data?

I’ve written a notebook which fetch data from a remote csv and enrich the dataset using calls to the Wikidata’s API. It seems that in many cases I’ve got runtime errors. Do you have any idea to avoid this kind of errors? I don’t have any background in Javascript, so any feedback is welcome. I suspect the use of the arquero library in the get_instanceof_fromsitelink() function to cause some runtime but this is just an intuition.

What notebook are you looking at? I searched the site for “get_instanceof_fromsitelink” and I bet you’re talking about What kind of articles have you created ? / PAC | Observable, but for future reference it helps us if you include a link.

A good first step for getting more detail about errors is to open your browser’s “console”. In Chrome on Mac you can press Cmd-Opt-J or go to View → Developer → JavaScript Console.

When you look at that on your notebook, you see a bunch of error messages like this about network requests — happening before Arquero comes into the picture at all, I think:

Access to fetch at ‘https://www.wikidata.org/w/api.php?action=wbgetentities&titles=Grigol&sites=frwiki&format=json&normalize=true&origin=*’ from origin ‘https://pac02.static.observableusercontent.com’ has been blocked by CORS policy: No ‘Access-Control-Allow-Origin’ header is present on the requested resource. If an opaque response serves your needs, set the request’s mode to ‘no-cors’ to fetch the resource with CORS disabled.

CORS is a security policy that restricts how sites can fetch files and data from other sites; it’s a frequent headache when fetching data on the web, and there are some notebooks about using proxies like https://cors-anywhere.herokuapp.com/ to get around it, though I’m not seeing a great concise one to link you to right now.

From googling, it seems like Wikidata should have CORS enabled, so we shouldn’t have to use a proxy. And if I take a URL that failed, I can make an individual request to Wikidata without a CORS error.

Looking at the “Network” tab of the Chrome developer tools (same sidebar as the console we brought up earlier), it looks like your notebook makes about 1,567 requests to wikidata.org every time it loads. That’s a lot!! I’d guess they might be rate-limiting you and it comes out as a CORS error, but I’m not sure.

I’d take a look at if you’re fetching data inside a loop or something like that (like making a request for every item in an array). Maybe you don’t need all those requests. Or maybe you’re going to have to do something like batching them and saving the results as JSON or CSV so that you don’t have to make every request every time.

Sorry, I haven’t really resolved your problem, but that’s all I can do for now — I don’t even have presents for my parents! Hopefully it at least gives you some ideas of where to look.

2 Likes

Thanks for your feedback.

My URL is What kind of articles have you created ? / PAC / Observable.

I’ll look further to understand the bug. Thanks for your tips.

The MediaWiki API action wbgetentities that you’re using allows you to query multiple IDs in a single request, by concatenating the IDs with |. Example:

https://www.wikidata.org/w/api.php?action=wbgetentities&ids=Q5|Q863247&format=json&origin=*

Check the documentation for more details about the action parameters: MediaWiki API help - Wikidata

1 Like

Here is a new function which send an array of articles to the Wikidata API : How to get Wikidata claims from Wikipedia sitelinks using Wikidata API ? / PAC / Observable.

Now I’m looking forward to deal with cases where there are more than 50 articles.

Thanks for your feedback. It was really useful for my investigation.

Here is the solution : How to fetch Wikidata claims from the list of pages created ? / PAC / Observable

I’m splitting arrays into chunks of length 50 to call the API. Hope it will work better.