writing from Observable to AWS S3 *CSV (or other object)

Inspired by database failed to fetch, I’d like to delve deeper into capacities for Observable to help quickly create interfaces for editing / updating data.

Let’s say that I have data stored as a *CSV file, and I’d like to change this data from time to time. Following @bgchen, I can do this by storing the data on a Google Spreadsheet, and edit it there, which will effect changes in my notebook. Similarly, I can host this data on GitHub and edit it there. But I would really like to make changes from within the notebook itself.

Using AWS S3 with an open CORS policy, I believe that any public user can theoretically ammend information on that objects (in this case, our .CSV file). Following @mbostock’s clarification, I can also limit my CORS policy to allow edits from only my notebook user. So this seems promising.

But now what?

Without added complications of security keys and secrets, is it correct that an AWS S3 open CORS data object can readily be edited within a notebook? If so, how?

If anyone has any suggestions, I’d really appreciate it. Also, if someone needs an open CORS bucket and random object, here’s one:

https://s3.amazonaws.com/testing.cors/object.csv

Thanks in advance for any help!

2 Likes

It looks like the AWS JS SDK will help, but I haven’t yet gotten any of the online examples working…

Some other references:

Also, just to note - Tom’s module debugger suggested first to try the global leak pattern for require

AWS = require('aws-sdk@2.519.0/lib/aws.js').catch(() => window["AWS"])

…but I found that just requiring the URL worked better:

AWS = require('https://sdk.amazonaws.com/js/aws-sdk-2.519.0.js')

… More soon!

1 Like

In case anyone is following along:

I am still learning about data persistence and trying to find a means of supplying a data object to Observable that I can then write back into.

Since raising this question, Mike published an Introduction to Serverless Notebooks, which walks through a series of steps and arrives at a persistent data store saved to AWS Dynamodb. I am not (yet) fluent in databases and query languages, and I wish instead to save to CSV or JSON in S3 - and I am still figuring out how to make the connection.

Following on with my above lines of research on means of securely and directly writing to S3 via Observable:

AWS SDK for JavaScript allows for this, and @shafdog created an authentication helper notebook called O-AWS. His method is pretty fun: storing credentials in local storage.

In the process of working through shafdog’s examples, I learned a bit about limiting permissions within a bucket to a single folder when configuring the IAM user, and then passing information on the folder path as follows:

bucketParams = ({
    Bucket : 's3bucket',
    Delimiter : '/', 
    Prefix : 'folder'
  });

and to see the contents, I invoke the SDK as follows:

s3.listObjects(bucketParams).promise().then((json) => json.Contents);

Exposing the s3 bucket with entirely open CORS was one step to get this working. The other was setting permission on the file object. Open bucket CORS permissions don’t themselves allow for writing to the object, so the test CSV file I linked to earlier (on which I never allowed write access) was public and accessible for CORS, but having never established write open write permissions on the bucket, it was (and remains) public ‘read only’.

Hopefully I’ll be able to share more about moving information from Observable to S3 soon.

1 Like

Web security is tricky. CORS does not really do anything, people can still use curl to attack your resources outside of browser context.
The only way is with a ‘login’, where you prove u are somebody to an identity provider with secret knowledge.
It’s trivial for you to use Firebase Storage (Google cloud storage under the hood) using firebase login and web sdk https://observablehq.com/@tomlarkworthy/firebaseui

The Amazon equivelent is a cognito user pool, API gateway over your s3 resources. I think AWS amplify bundles this up and provides a UI too, so take a long look at that.

2 Likes

Hi @tomlarkworthy, and thanks for the reply! :slight_smile:

I really enjoy your many notebooks and appreciate that you share how to integrate Observable, Firebase and Stripe. It’s also super cool how you show how to deploy servers-ide cells with Firebase, and from there hosting a web page. While I admittedly haven’t followed your tutorials step-by-step yet, I have read them completely (and your other notebooks – thanks for your announcement, from which I started following you) and they look like they would provide me exactly the functionality that I’ve been looking for.

Thanks also for pointing me to Amplify. The O AWS notebooks also provided a ‘bonus’ require method for Amplify. The trouble that I have in using it is that it seems to be closely tied to React (or similar) and/or Webpack. This is all pretty far over my head for the moment.

With respect to my resistance to move away from the AWS ecosystem toward Google or other solutions is a matter of legacy. I came to AWS back in 2010 to set up cloud servers on EC2. At the time, I wanted to do things like to deploy a wiki to store the institutional documents and linked information for a company I once worked for, to create an run our own VPNs to get around ‘the Great Firewall’ (don’t do this now; I think it’s illegal and in any event all my VPNs are found and closed in a day), to automate a backup-over-internet routine for company computers, etc. Since that time, I started using AWS for more things, and have come to be particularly reliant on S3 as a data store for GIS files, research papers, photographs, and other digital media. Over the years, I’ve grown comfortable with EC2 and S3 integrations, I have decent capacities using AWS CLI for this work, I’ve grown better with IAM roles, but that’s about it. Following Mike’s serverless recipe was one of my first times dabbling with Lambda, and that notebook showed me that there is indeed a way to write data from Observable to S3 securely.

Since starting this post, I am happy to share that I started working with @noise-machines to learn how to make this happen (in response to this call for EoIs). I imagine Thomas (and his colleagues at DataJoy) will be sharing a ‘helper library’ and ‘how to’ tutorial with us very soon, and I look forward to sharing.

I really, really appreciate all the generous people in this community! Thank you for your time, insight, and notebooks!

“Following Mike’s serverless recipe was one of my first times dabbling with Lambda, and that notebook showed me that there is indeed a way to write data from Observable to S3 securely.”

I want to emphasise that recipe IS NOT SECURE to production standards. The only security is validation on the origin header

// Validate the origin.
  if (event.headers.Origin !== "https://████████.static.observableusercontent.com") {
    return {statusCode: 403};
  }

but this is trivially spoofed by curl (-H “Origin:XXXX”). So the real mechanism of security is that an attacker has to know what the origin is in order to spoof it. In a team notebook this is private information but in a public notebook you have no such protection. Even in the private setting it’s still not a good choice of secret as its impossible to rotate if it ever did leak and plus its written on every outbound web requests.

BTW, I have a mechanism for secrets in public notebooks nearly finished. This will make all these integrations much easier IMHO. (including AWS based ones). The very tricky bit is where to place the client_secret in OAuths… I nearly have that solved.

1 Like

this may be useful: Achievement Unlocked: Secrets in public notebooks

1 Like

I have a full example of AWS integration reading and writing to s3. https://observablehq.com/@tomlarkworthy/access-aws

2 Likes

Oh wow! Look at you go! Thank you for sharing and for all your insights!

2 Likes

Sorry I could not help myself. Neither Mike nor my lastest AWS notebook really had user authentication for security. I was thinking about how to add that without too much friction, and realized you can store passwords in the notebook as long as you hash them

This one is also implemented slightly differently with a direct clientside AWS connection using temporary credentials. So you have lots of options now!

2 Likes

To bring this to a bit of a close (for me, at least), I’d like to share this notebook, which walks through the process of creating an authenticated version of the AWS SDK for JavaScript and using it to read and write from AWS S3 via Observable:

I tried following @tomlarkworthy’s approach to credentialing linked above, but got a bit caught up on the secret injection step. The approach I utilize only works if you can supply valid credentials into the notebook, following @shafdog’s use of local storage, as described in O-AWS . Beyond that, I don’t contribute too much, except having figured out a few bits on how to construct the getObject and putObject commands so that they work from the browser. While I imagine many of you gurus regularly using AWS and Observable know this kind of thing, I had a real tough time figuring it all out. Thanks to @noise-machines for showing me the way! And thanks to everyone here who gently and patiently guides me!

Now that I can read and write into S3, the next challenge becomes how to actually make effective use of this…

Thanks again!

4 Likes

Well done you did it. Yeah the development experience is not good with Auth. I try to think if there is a simpler way. Its annoyingly easier on Google Cloud as everybody already has a good account you can use with Cloud permissions, but then google cloud doesn’t have many web SDKs so the functionality is not there. AWS has the functionality, but are horrible Auth as you have to set things up on a per project basis. :s

Kudos you got something going.

2 Likes