Private Data Repos and Observable

zachbogart · September 29, 2022, 9:20pm

Hi there,

A problem I often face is getting data from a private repo into Observable. I’ll have a Bitbucket repo at work, with data wrangling of raw data for example, and I’ll want to pull in a cleaned data file to Observable for visualizing. I could download a version of it as a FileAttachment, but if/once it changes, I’ll need to rerun the script and update the attachment. I have seen in other notebooks examples of reading in files from Github links, like using Apache Arrow to access large datasets from a URL (awesome). This seems super helpful for accessing larger datasets than the FileAttachment size limit and being more explicit about where the data is coming from.

Can I get some schooling about working with data repos and Observable? Are there ways to connect to private repositories for smoother data updates? Should I be looking into some sort of external hosting of data that Observable can access? If I do data wrangling in an external private repo, how would you suggest getting that data into Observable?

Always learning,
Zach

eagereyes · September 30, 2022, 12:20am

Hi Zach, you can use the github API to access files in private repos. Here’s a good guide I found. You can store your API key as a secret in your account. You can only use them in private notebooks, but I assume that is what you’re after.

Is this what you were looking for?

zachbogart · September 30, 2022, 12:24am

This seems promising. Maybe a similar method can work for Bitbucket repos?

eagereyes · September 30, 2022, 12:31am

I would think so. BitBucket has an API that looks like it has methods to access files.

One problem can be CORS, which is meant to protect you from cross-site scripting attacks, but in this case makes things unnecessarily difficult. There’s a clever proxy to work around that, though (not sure if you need it, but just in case).

tophtucker · September 30, 2022, 1:17am

That seems like a good option. It also reminds me of our self-hosted database proxy. Similarly, you could run a web server on your local machine and fetch files from that in Observable. (@visnup just showed me this; we should write it up as a tutorial notebook.)

If you’re in Chrome or Edge, you can connect to your local computer directly over HTTP. Open a bash terminal and go to a folder with some data (e.g. I’ve got aapl.csv in there). Then run:

npx http-server --cors

It’ll tell you your local IP, and then you can run in a notebook:

fetch("http://127.0.0.1:8080/aapl.csv").then((resp) => resp.text())

If you’re in Safari or Firefox, you have to connect over HTTPS. You can install ngrok in the terminal:

brew install ngrok

And then, with http-server still running, run:

ngrok http 8080

And it’ll give you a forwarding address you can use in the notebook to get your local files over HTTPS, something like:

fetch("https://b701-125-211-126-202.ngrok.io/aapl.csv").then(resp => resp.text())

With the ngrok approach, other people can also access your files with that forwarding address, at least until you quit ngrok. (It’ll give you a different address every time.)

zachbogart · September 30, 2022, 2:40am

Nice, I’ll have to check this stuff out. We’re certainly getting into territory I find hard to follow, so tutorial notebooks appreciated!

@tophtucker would all of this be much easier/already possible if it was a public repo? In my head it would be really helpful to point to a file in a repo and download it via a URL, then could update the repo file without having to alter the Observable data connections. Sounds like the private nature introduces more challenges.

tophtucker · September 30, 2022, 7:19am

I can improve this but as a placeholder, little video demo here: Fetch a local file from a notebook / Toph Tucker / Observable

Topic		Replies	Views
Getting CSV file from private repo in Github Community Help	1	988	June 13, 2023
If Observable authenticates via GitHub, can I easily write to it? Community Help	24	2415	September 12, 2019
Private Installations? Community Help	4	4040	March 15, 2018
Uploaded data security Community Help	2	695	October 17, 2019
Good data hosting options? Community Help	5	685	June 23, 2021

Private Data Repos and Observable

Related topics