read data from Google Cloud Storage public bucket

Hi, does anyone know if/how we can load data from a public Google cloud storage bucket into ObservableHQ?

2 Likes

yes. There are many ways depending on what kind of access control you want on it. Obviously public read access is very easy. If it has sensitive info you might want to put it behind a login. What are your requirements?

Itā€™s also relevant if you want write access from observable, or purely read.

If you did not know, Firebase storage is a dedicated clientside wrapper for GCP buckets. So thats the most full featured approach. I demoed firebase storage in a twitch yesterday:

I also have an example of using GCP storage as a BQ cache here:

But itā€™s not necessary to go through Firebase if you donā€™t need all those features.

1 Like

Thanks @tomlarkworthy! For now, we only want to read data from a public GCP bucket and we want that to be done on a public Observablehq notebook (so thereā€™s no current login requirement). Preferably, we could talk directly to Google without a firebase intermediate just to keep things simple.

Iā€™ll look over the examples (I knew I should have watched that twitch live :grin:). Do you know if we can use Googleā€™s mode.js client libraries on Observablehq https://cloud.google.com/storage/docs/reference/libraries#client-libraries-install-nodejs ?

Node.js clients do not work (gRPC) but you can use the machine generated GAPI clientsā€¦ thats how I access BigQuery from the notebook Google API Client / Tom Larkworthy / Observable

its only a few steps better than REST though and pretty painful all the way.

However, for public info in a public bucket. You can just fetch(URL) them right? No need for clients, the buckets speak REST. I think you just need to enable a CORS policy for the bucket and the data is public over vanilla HTTP.

1 Like

Thanks @tomlarkworthy, I think the issue is that our bucket is not enabling CORS. Iā€™ll give this a try and confirm it works. This must be how @enjalot is getting data from Google cloud storage here Pigs / Ian Johnson / Observable

@cornhundred Yes I manually enabled CORS via Configuring cross-origin resource sharing (CORS) Ā |Ā  Cloud Storage
on my bucket to use it in the notebook

1 Like

Hi, I have to read from a private bucket that has CORS enabled and can be accessed by providing a token. Are there any example ObservableHQ notebooks that show how to login into a Google account to obtain a token? Iā€™m doing something similar in Python here

from google.colab import auth
from google.auth import default
import google.auth.transport.requests

auth.authenticate_user()
creds, _ = default()
request = google.auth.transport.requests.Request()
creds.refresh(request)

token = creds.token

and Iā€™m able to use this token in an ObservableHQ notebook to access the data on my private bucket. Iā€™m wondering if I can get this token from within a notebook?

1 Like

So if you want third party access its a huge pain and you need somewhere to put secrets. I do have a Google Oauth login example here:-

However, if you just want to give yourself write access and everybody else public read access (or just yourself read access), the simplest is to use an S3 compatible API key, which you can put in an Input.password, bind to local storage @tomlarkworth
y/localStorageView
so you only have to do it once per device. These S3 compatible keys donā€™t expire.

With GCS in S3 compatability mode you can just use any of the zillions of S3-lite clients or even AWSs browser S3 client. I personally have converged to S3 API being the one-true-storage API

1 Like

Thanks, Iā€™ll look through the notebook. My use case would be that I would create a notebook pointing to a dataset on a private Google bucket that I and someone else has access to via our Google accounts. The notebook user would then click the Google login link to receive a temporary token for read only access to their data via CORS. Would I need to use secrets in my case?

yes, the CLIENT_SECRET in Oauth. There is a device login flow for Oauth that avoids it, which is hinted at here in Google docs OAuth 2.0 dĆ nh cho TV vĆ  į»©ng dį»„ng thiįŗæt bį»‹ đįŗ§u vĆ o hįŗ”n chįŗæ  |  Authorization  |  Google for Developers

so maybe this is a better way but I have not tried it. Let me know if that is good enough for GCS access.

Ok, let me give the Oauth notebook a try and let you know how it goes.

I think with device Oauth flow you wonā€™t need that oauth notebook. Also consider just sharing an API key with the other person if you know them, you can also create a separate key and revoke it later if needed

Thanks, Iā€™m a bit lost with all the options :slight_smile: but Iā€™ll read over again and see if I can work it out. I guess another option might be to set up an enterprise account with login using Framework Observable Framework for Private Data (login required) - #2 by tophtucker

Yes I agree its too complicated, for personal stuff I shove API keys in localstorage. I have a Google doc with all my keys, and when I use a new device I copy and paste the key over.

If I want to collaborate in a group I trust, I put a shared key in URL params and share the link.

I can imagine a more advanced version of this where everybody gets their own key to put in their own personal password manager. I kinda think oauth is overkill for small groups and its only justified for actual products and not really the right level of ROI for ad-hoc collaborations.

1 Like

Thanks for the advice, unfortunately the Google bucket Iā€™m using being created within a Terra.bio workspace and I donā€™t currently have permission to create a key (assuming Iā€™m understanding correctly).

I am able to get a short lived token from a Terra notebook (where the notebook knows the user is authenticated) or by using the above Google Colab approach by logging into my Terra associated account and copying the token over to ObservableHQ.

So for now we might just recommend users create the token on Terra and manually copy it over to ObservableHQ - our use case is that Observable will create a visualization of a dataset that they are analyzing on a Jupyter notebook. Or we can use Google Colab to quickly create a token if a user wants to visualize their data on Observable without using Terra.

who owns the storage bucket? We can do a call if you want.

1 Like

The bucket is owned by Terra. Sure a call would be great if youā€™re available. Iā€™ll message you my email.