Cached bigquery results for a public notebook?

Hi; I see related questions on the topic of sharing database connections in public notebooks, and I understand why that can be complex/risky.

I don’t especially need a public version of a notebook to actually run queries. But I’d like to work on a notebook iteratively using bigquery, modifying my queries as I go, before publishing the results for broader consumption. That is: I’d like the public version to carry the results of the last run.

I can see a simple way to work around this (e.g., save the data from the query, then upload the same data back into the same notebook?), but that requires manual intervention and perhaps a couple of errors on the public page if I want to keep the queries in place for readers.

I’ve been using observable for ~24 hours, and I love it so far. Perhaps I’ve missed something obvious! Is there a no-brainer way of sharing the results of database queries in public notebooks without the errors?

1 Like

Hi, and welcome to Observable! Amazing that you got to this point in only 24 hours :slight_smile:
Your current workflow is exactly right for now… And you are right, it is a bit too manual. One of the ideas we are discussing is how to make it easy to cache these results for notebook consumption (private or public), possibly as file attachments. If you want to suggest a workflow, we would love for you to log a feature request in our feedback repository here and then you can monitor the progress and suggest ideas towards the best solution.

Thanks!

2 Likes

Got it! I threw something here: cached bigquery results for a public notebook · Issue #58 · observablehq/feedback · GitHub

I have been working towards this EXACT usecase. So your question prompted me to finally put all the pieces together.

I hope to provide dynamic filestorage at some point, but it’s easy enough to bring your own Firebase storage bucket for now.

1 Like

That’s a bit of a journey – I had to poke some things in Firebase – but it definitely gets things a lot of the way there. Thanks!

There’s a different downside with this approach, that the jobs.query endpoint returns separate schema/values arrays in the response, not the array of key:value dictionaries that the DatabaseClient query call returns. As a one-off I can mangle my data fairly easily, but is there an alternative API endpoint to retrieve results and/or has somebody already written code to parse these cleanly?

Maybe some code in the nodejs client? But yeah, they never intended calling BQ clientside so the JS API is wed to node.js and probably gRPC based and not portable.

I am kinda expecting the workflow would be in BQ app and you would only need to export a single query or two, it’s not good for exploration but ‘ok’ for plumbing.

That said: if I skip the JS API pieces, and instead keep the DatabaseClient query call then feed the results of query() into firebase/retrieve from firebase into a different variable, I think I wind up somewhere similar. I can feed the data retrieved from firebase into vega-lite plots as before; I imagine there might be consistency issues if I’m not careful, I haven’t properly road-tested yet.

The notebook still throws errors about the DatabaseClient when it’s published (of course) so it’s not totally clean, but it can still read the most recent cached result and build out plots. That’s closer to what I’d like, without as much extra work. (I may also look into whether it’s possible to achieve the same using Google Cloud Storage directly, without the extra Firebase stuff!)

Oh yeah! you can probably achieve the same role based access control just with IAM on cloud buckets without Firebase Storage. In the notebook we are minting access_tokens so you can do any cloud Bearer token thing. It’s very easy to access cloud storage with a token. That would totally work.

So, today I finally got back to looking at this. And I figured I could have the service account for the database side, and a service account for the cloud storage side. I quickly got lost in the maze of google documentation, oauth2 authentication, service keys, API keys, access tokens, and opaque 401s.

Do you have, or know of, a minimum working example to upload a file to cloud storage from JS? I appreciate the firebase stuff, but in terms of simplifying the surface for me (and our support staff :slight_smile: ) it’d be neat to keep things as clean as possible!

OK added how to go direct to a cloud bucket (How to cache BigQuery results in a public Notebook with Firebase Storage or a Cloud Bucket / Endpoint Services / Observable) as an alternative route. Here is the trick:-

fetch(`https://www.googleapis.com/upload/storage/v1/b/larkworthy-dfb11.appspot.com/o?name=examples/cache-bigquery/dataalt.json`, {
    headers: {
      'Authorization': `Bearer ${await getAccessTokenFromServiceAccount(service_account_credentials)}`
    },
    method: "POST",
    body: JSON.stringify(data)
})

Kinda suprised how un-restful the storage API is :confused: Objects: insert  |  Cloud Storage  |  Google Cloud

You could also get your hands on an access_token by logging in a user rather than embedding a service account.

3 Likes

Awesome, thank you! I was at least part way there, but I’d gone down an access token rabbit-hole.

I think this is enough for me to have a working notebook working sort of how I want it now. I’ll recreate from scratch next week, just to be sure!

1 Like