backup method?

Attention, there’s a serious flaw in both my script and Bryan’s version:

  • The code uses agent.query(...) for the step where the credentials get passed to Github via POST.
  • This appends the data to the URL instead of passing it in the body.
  • As a result your password will most likely end up in Github’s logs.

If you’ve used either one of those scripts I’d recommend changing your Github password, just to be safe. I’ll fix my gist shortly.

2 Likes

Gist has been updated. Here’s the diff: https://gist.github.com/mootari/511b751e325db8316bb3138dcb0a7393/revisions#diff-168726dbe96b3ce427e7fedce31bb0bc

1 Like

Wow, thanks for noticing this. I’ve updated my script just now too.

1 Like

PSA: Observable has dropped the “beta.” subdomain. Be sure to update the site path and cookie domain in your scripts.

1 Like

I’ve added this block in @bgchen’s version of the script:

      case 'backup-user': {
        // backup public documents for @user
        const user = (process.argv[3] || "").replace(/^@/, "");
        const dir = process.argv[4] || "data";
        if (process.argv[3]) {
          let before = "";
          const dirName = `${dir}/${user}`;
          try {
            fs.mkdirSync(dirName);
          } catch(e) {}
          do {
            const nbdat = await api.get(`/documents/@${user}${before}`);
            for (const nb of nbdat) {
              const fileName = `${dirName}/${nb.slug.replace("/", ".v")}.json`;
              let savedContent;
              try {
                savedContent = JSON.parse(fs.readFileSync(fileName, 'utf8'));
              } catch(e) {}
              if (savedContent && savedContent.version >= nb.version) {
                console.log(`Skipping ${nb.title}`);
              } else {
                console.log(`Downloading ${nb.title}`);
                const nbdat = await api.get(`/document/${nb.id}`);
                fs.writeFileSync(fileName, JSON.stringify(nbdat), {flag:'w'});
              }
            }
            before = nbdat.length ? `?before=${nbdat.pop().update_time}` : "";
          } while (before)
          break;
        }
      }

Usage:

> node index.js backup-user @fil

will create a data/fil/ directory containing all my published notebooks. Note that, if the notebook is published but has been modified since last publication, what I receive and save is the most current version.

(Should me move this thread to a github project?)

2 Likes

I’m almost done setting up a repo, just wanted to clean up a few things beforehand.

Thanks for sharing, I was wondering about your backup requirements when I tried to plan the high-level API and helpers.

1 Like

this has been failing with

(node:35847) UnhandledPromiseRejectionWarning: TypeError: Cannot read property 'value' of undefined
    at ObservableAPI.authorizeWithGithub (/Users/fil/Source/observable/backup/api.js:100:29)
    at processTicksAndRejections (internal/process/next_tick.js:81:5)

(last time it worked was about a week ago)

I noticed this too. I think the issue is that the Observable page now generates the “T” token client-side and posts that to the server, whereas the scripts have been getting this token from the cookie (?).

(I’ve been waiting for @mootari to share his repo so that we’ll have something nicer to build from than my crude edit :smile: )

Yep, the relevant code is:

      onClick: ()=>{
        n && n(),
        window.location.assign(function(e) {
          return `https://github.com/login/oauth/authorize?scope=user:email&client_id=1a8619df27715d9d2c97&state=${pu()}&redirect_uri=${`https://api.observablehq.com/github/oauth?path=/loggedin${e}`}`
        }(r))
      }

and

  function pu() {
    const e = document.cookie.match(/(?:^|;)\s*T\s*=\s*([0-9a-f]{32})(?:$|;)/);
    if (e)
      return e[1];
    const t = (n = 16,
    Array.from(crypto.getRandomValues(new Uint8Array(n)), e=>e.toString(16).padStart(2, "0")).join(""));
    var n;
    const r = new Date(Date.now() + 1728e5);
    return document.cookie = `T=${t}; Domain=.observablehq.com; Path=/; Secure; Expires=${r.toUTCString()}`,
    t
  }

So, so sorry about the delay. These last days I’ve been either too swamped or too tired to finish setting up the repo. I’ll set it up this weekend, promise. :slight_smile:

1 Like

I’ve updated the gist to have ensureToken() generate the token by itself instead of fetching it from the server.

3 Likes

Thanks! I used your work to update my version just now:

I made one change to your new ensureToken. Instead of:

ensureToken(regenerate = false) {
  if (!regenerate && this.getToken()) return;

I use:

ensureToken(regenerate = false) {
  if (!regenerate && this.getToken() && this.getToken().value !== '') return;

The third clause is necessary since after authentication, the response from https://observablehq.com/loggedin has a set-cookie header which reads set-cookie: T=; Max-Age=0; Domain=.observablehq.com; HttpOnly; Path=/; Secure.

Edit: updated per @mootari’s comment below.

2 Likes

I noticed that too, but wrongly dismissed it as irrelevant. I’d add the check against value to getToken() though, otherwise code might fetch a token cookie with an empty value. I’ve updated my gist accordingly.

1 Like

The repo is now available:

4 Likes

Is this still working for you? I have stopped using it for quite a while and now it doesn’t want to connect anymore.

Never got around actually to using it. :slight_smile: Does the authentication fail, or can you at least retreive a token?

Edit: The login has now an extra step, and the CSRF token only gets set after one inputs the name. :thinking:

1 Like

Don’t worry—I just wanted to check if it was just broken for me, but I don’t really need it for the moment. I still would like to be able to bulk download easily for backup and grep :slight_smile:

Noone else has complained either. I may have to assume that noone is using it. :cry:

Anyway, can I ask you to open an issue in GitHub - mootari/observable-client ?

1 Like

I dunno if I should necro these threads but it makes sense to have only one backup thread IMHO. My backup solution exports to storage, ordered by update timestamp and checks version ids so it can stop early if nothing has changed.

I’ll probably automate it with cron once I gat some confidence with it.

I am not really planning on making this a service, but its pretty easy to copy if people want.

2 Likes

I did not like the previous approach in the end, it requires too much manual triggering and it was hard to setup in the first place, plus the end result was a tar archive that was hard to interact with. So based on the learnings from the previous one, I have done a fresh approach.

This new backup solution triggers a Github Action to unpacks the tar code, syncs with Github, and runs after every publish automatically, it also works with non-public team notebooks! You can point everything at a common repository, because the notebooks are unpacked into a directory mirroring their URL, you can take a look here. It only took me 270 Github action attempts before I got it!

4 Likes

The intent of the github backups notebook is that you could set it up once in a personal backup notebook that can then be transitively imported everywhere you need backups and avoid having to configure the github token each time.

This was not working under certain conditions, thanks @jimpick for reporting the issue, which is now solved. As you can see I personally have quite the collection of backups now:- observable-notebooks/@endpointservices at main · endpointservices/observable-notebooks · GitHub

which only requires me to import my footer notebook

that footer does a few other useful things like install an error reporting framework and usage analytics.

3 Likes