read data from Google Cloud Storage public bucket

cornhundred · April 15, 2021, 9:19pm

Hi, does anyone know if/how we can load data from a public Google cloud storage bucket into ObservableHQ?

tomlarkworthy · April 16, 2021, 8:19am

yes. There are many ways depending on what kind of access control you want on it. Obviously public read access is very easy. If it has sensitive info you might want to put it behind a login. What are your requirements?

It’s also relevant if you want write access from observable, or purely read.

If you did not know, Firebase storage is a dedicated clientside wrapper for GCP buckets. So thats the most full featured approach. I demoed firebase storage in a twitch yesterday:

I also have an example of using GCP storage as a BQ cache here:

But it’s not necessary to go through Firebase if you don’t need all those features.

cornhundred · April 16, 2021, 11:39am

Thanks @tomlarkworthy! For now, we only want to read data from a public GCP bucket and we want that to be done on a public Observablehq notebook (so there’s no current login requirement). Preferably, we could talk directly to Google without a firebase intermediate just to keep things simple.

I’ll look over the examples (I knew I should have watched that twitch live ). Do you know if we can use Google’s mode.js client libraries on Observablehq https://cloud.google.com/storage/docs/reference/libraries#client-libraries-install-nodejs ?

tomlarkworthy · April 16, 2021, 1:06pm

Node.js clients do not work (gRPC) but you can use the machine generated GAPI clients… thats how I access BigQuery from the notebook Google API Client / Tom Larkworthy / Observable

its only a few steps better than REST though and pretty painful all the way.

However, for public info in a public bucket. You can just fetch(URL) them right? No need for clients, the buckets speak REST. I think you just need to enable a CORS policy for the bucket and the data is public over vanilla HTTP.

cornhundred · April 16, 2021, 2:47pm

Thanks @tomlarkworthy, I think the issue is that our bucket is not enabling CORS. I’ll give this a try and confirm it works. This must be how @enjalot is getting data from Google cloud storage here Pigs / Ian Johnson / Observable

enjalot · May 7, 2021, 6:29pm

@cornhundred Yes I manually enabled CORS via Configuring cross-origin resource sharing (CORS) | Cloud Storage
on my bucket to use it in the notebook

cornhundred · October 17, 2024, 5:11pm

Hi, I have to read from a private bucket that has CORS enabled and can be accessed by providing a token. Are there any example ObservableHQ notebooks that show how to login into a Google account to obtain a token? I’m doing something similar in Python here

from google.colab import auth
from google.auth import default
import google.auth.transport.requests

auth.authenticate_user()
creds, _ = default()
request = google.auth.transport.requests.Request()
creds.refresh(request)

token = creds.token

and I’m able to use this token in an ObservableHQ notebook to access the data on my private bucket. I’m wondering if I can get this token from within a notebook?

tomlarkworthy · October 17, 2024, 6:09pm

So if you want third party access its a huge pain and you need somewhere to put secrets. I do have a Google Oauth login example here:-

However, if you just want to give yourself write access and everybody else public read access (or just yourself read access), the simplest is to use an S3 compatible API key, which you can put in an Input.password, bind to local storage @tomlarkworth
y/localStorageView so you only have to do it once per device. These S3 compatible keys don’t expire.

With GCS in S3 compatability mode you can just use any of the zillions of S3-lite clients or even AWSs browser S3 client. I personally have converged to S3 API being the one-true-storage API

cornhundred · October 17, 2024, 6:25pm

Thanks, I’ll look through the notebook. My use case would be that I would create a notebook pointing to a dataset on a private Google bucket that I and someone else has access to via our Google accounts. The notebook user would then click the Google login link to receive a temporary token for read only access to their data via CORS. Would I need to use secrets in my case?

tomlarkworthy · October 17, 2024, 6:28pm

yes, the CLIENT_SECRET in Oauth. There is a device login flow for Oauth that avoids it, which is hinted at here in Google docs OAuth 2.0 dành cho TV và ứng dụng thiết bị đầu vào hạn chế | Authorization | Google for Developers

so maybe this is a better way but I have not tried it. Let me know if that is good enough for GCS access.

cornhundred · October 17, 2024, 7:23pm

Ok, let me give the Oauth notebook a try and let you know how it goes.

tomlarkworthy · October 17, 2024, 9:31pm

I think with device Oauth flow you won’t need that oauth notebook. Also consider just sharing an API key with the other person if you know them, you can also create a separate key and revoke it later if needed

cornhundred · October 18, 2024, 5:12pm

Thanks, I’m a bit lost with all the options but I’ll read over again and see if I can work it out. I guess another option might be to set up an enterprise account with login using Framework Observable Framework for Private Data (login required) - #2 by tophtucker

tomlarkworthy · October 18, 2024, 6:17pm

Yes I agree its too complicated, for personal stuff I shove API keys in localstorage. I have a Google doc with all my keys, and when I use a new device I copy and paste the key over.

If I want to collaborate in a group I trust, I put a shared key in URL params and share the link.

I can imagine a more advanced version of this where everybody gets their own key to put in their own personal password manager. I kinda think oauth is overkill for small groups and its only justified for actual products and not really the right level of ROI for ad-hoc collaborations.

cornhundred · October 19, 2024, 11:32am

Thanks for the advice, unfortunately the Google bucket I’m using being created within a Terra.bio workspace and I don’t currently have permission to create a key (assuming I’m understanding correctly).

I am able to get a short lived token from a Terra notebook (where the notebook knows the user is authenticated) or by using the above Google Colab approach by logging into my Terra associated account and copying the token over to ObservableHQ.

So for now we might just recommend users create the token on Terra and manually copy it over to ObservableHQ - our use case is that Observable will create a visualization of a dataset that they are analyzing on a Jupyter notebook. Or we can use Google Colab to quickly create a token if a user wants to visualize their data on Observable without using Terra.

tomlarkworthy · October 19, 2024, 7:46pm

who owns the storage bucket? We can do a call if you want.

cornhundred · October 19, 2024, 11:30pm

The bucket is owned by Terra. Sure a call would be great if you’re available. I’ll message you my email.

tomlarkworthy · February 1, 2025, 6:21pm

for the record we did manage to authenticate with Google using the oauth notebook but then we figured out a simpler solution altogether.

Topic		Replies	Views
Cloud Storage for Observable notebooks Show and Tell	2	650	May 13, 2021
Cached bigquery results for a public notebook? Community Help	10	588	April 16, 2021
Observable Framework for Private Data (login required) Community Help	3	352	April 19, 2024
Assuaging wariness using Google Drive Cloud Files Community Help	3	377	August 22, 2022
Optimal way to configure BigQuery/ BigQuery Bi Engine for Observable? Community Help	3	297	February 16, 2022

read data from Google Cloud Storage public bucket

Related topics