I am currently planning on using the JSZip library as seen here;
I think the JSZip has ability to unzip:
javascript, unzip, jszip
If I can read the Zip as an ArrayBuffer
https://stuk.github.io/jszip/documentation/examples/read-local-file-api.html
And chain this with the result of a GET request;
then await the results somehow
For example to download this corpus and read the text files from inside it
https://github.com/nltk/nltk_data/blob/gh-pages/packages/corpora/abc.zip
Any other recommendations?
I think this approach works well so far:
A better way to code.
The bulk of the code is:
d3
.buffer('https://cors-anywhere.herokuapp.com/' + 'https://github.com/nltk/nltk_data/raw/gh-pages/packages/corpora/abc.zip')
.then(arrayBuffer=> {
let zip = JSZip();
return zip.loadAsync(arrayBuffer);
})
.then(zip=>{
return zip.file(`abc/${abc_rural_science_choice}.txt`).async('string');
})
1 Like
tom
September 29, 2018, 4:48pm
3
Yep! I was just working together an example and you beat me to it The only little protip I should contribute beyond that is that you can skip the cors-anywhere
step by using raw.githubusercontent
:
This:
https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/packages/corpora/abc.zip
Will work just as well (and more reliably) than the cors-anywhere route. Itâs a little tricky to dig up this URL for binary files because GitHub doesnât show the Raw
link that it does for text files, but using direct GitHub URLs removes a little bit of complexity and means you donât have to rely on cors-anywhere.
1 Like
Another tiny thing is that you can use unpkg instead of bundle.run to load JSZipâyou just need to target the UMD bundle provided in dist
. Hereâs an example:
1 Like
This is consistent with Tomâs tutorial on require/modules , right?
Now that youâve found the repository, look through its code: does it have a UMD or AMD build somewhere in its package that you just need to require?
. Great I incorporated this change, thanks.
Great thanks; so simpleâŚ
I thought I was using ârawâ, because it is mentioned in the URL:
https://github.com/nltk/nltk_data/raw/gh-pages/packages/corpora/abc.zip
However, I see when I visit that site; I am 302-redirected to this request, which uses the URL as you say!
https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/packages/corpora/abc.zip
The response to this True âraw.githubusercontentâ request, is a 200 with Access-Control-Allow-Origin:*, preventing the CORS security exception.
I incorporated this change , thanks!
1 Like
For others who visit this question; note relevant help in the Introduction to Data notebook
Now you have a link to your file like this:
https://gist.github.com/mbostock/4063570/raw/11847750012dfe5351ee1eb290d2a254a67051d0/flare.csv
Unfortunately, this link doesnât support CORS , so youâll need to replace the gist.github.com
domain with gist.githubusercontent.com