A problem I often face is getting data from a private repo into Observable. I’ll have a Bitbucket repo at work, with data wrangling of raw data for example, and I’ll want to pull in a cleaned data file to Observable for visualizing. I could download a version of it as a FileAttachment, but if/once it changes, I’ll need to rerun the script and update the attachment. I have seen in other notebooks examples of reading in files from Github links, like using Apache Arrow to access large datasets from a URL (awesome). This seems super helpful for accessing larger datasets than the FileAttachment size limit and being more explicit about where the data is coming from.
Can I get some schooling about working with data repos and Observable? Are there ways to connect to private repositories for smoother data updates? Should I be looking into some sort of external hosting of data that Observable can access? If I do data wrangling in an external private repo, how would you suggest getting that data into Observable?
Hi Zach, you can use the github API to access files in private repos. Here’s a good guide I found. You can store your API key as a secret in your account. You can only use them in private notebooks, but I assume that is what you’re after.
Is this what you were looking for?
This seems promising. Maybe a similar method can work for Bitbucket repos?
I would think so. BitBucket has an API that looks like it has methods to access files.
One problem can be CORS, which is meant to protect you from cross-site scripting attacks, but in this case makes things unnecessarily difficult. There’s a clever proxy to work around that, though (not sure if you need it, but just in case).
That seems like a good option. It also reminds me of our self-hosted database proxy. Similarly, you could run a web server on your local machine and fetch files from that in Observable. (@visnup just showed me this; we should write it up as a tutorial notebook.)
If you’re in Chrome or Edge, you can connect to your local computer directly over HTTP. Open a bash terminal and go to a folder with some data (e.g. I’ve got aapl.csv in there). Then run:
npx http-server --cors
It’ll tell you your local IP, and then you can run in a notebook:
fetch("http://127.0.0.1:8080/aapl.csv").then((resp) => resp.text())
If you’re in Safari or Firefox, you have to connect over HTTPS. You can install ngrok in the terminal:
brew install ngrok
And then, with http-server still running, run:
ngrok http 8080
And it’ll give you a forwarding address you can use in the notebook to get your local files over HTTPS, something like:
fetch("https://b701-125-211-126-202.ngrok.io/aapl.csv").then(resp => resp.text())
With the ngrok approach, other people can also access your files with that forwarding address, at least until you quit ngrok. (It’ll give you a different address every time.)
Nice, I’ll have to check this stuff out. We’re certainly getting into territory I find hard to follow, so tutorial notebooks appreciated!
@tophtucker would all of this be much easier/already possible if it was a public repo? In my head it would be really helpful to point to a file in a repo and download it via a URL, then could update the repo file without having to alter the Observable data connections. Sounds like the private nature introduces more challenges.
I can improve this but as a placeholder, little video demo here: Fetch a local file from a notebook / Toph Tucker / Observable