I’ve added this block in @bgchen’s version of the script:
case 'backup-user': {
// backup public documents for @user
const user = (process.argv[3] || "").replace(/^@/, "");
const dir = process.argv[4] || "data";
if (process.argv[3]) {
let before = "";
const dirName = `${dir}/${user}`;
try {
fs.mkdirSync(dirName);
} catch(e) {}
do {
const nbdat = await api.get(`/documents/@${user}${before}`);
for (const nb of nbdat) {
const fileName = `${dirName}/${nb.slug.replace("/", ".v")}.json`;
let savedContent;
try {
savedContent = JSON.parse(fs.readFileSync(fileName, 'utf8'));
} catch(e) {}
if (savedContent && savedContent.version >= nb.version) {
console.log(`Skipping ${nb.title}`);
} else {
console.log(`Downloading ${nb.title}`);
const nbdat = await api.get(`/document/${nb.id}`);
fs.writeFileSync(fileName, JSON.stringify(nbdat), {flag:'w'});
}
}
before = nbdat.length ? `?before=${nbdat.pop().update_time}` : "";
} while (before)
break;
}
}
Usage:
> node index.js backup-user @fil
will create a data/fil/ directory containing all my published notebooks. Note that, if the notebook is published but has been modified since last publication, what I receive and save is the most current version.
(node:35847) UnhandledPromiseRejectionWarning: TypeError: Cannot read property 'value' of undefined
at ObservableAPI.authorizeWithGithub (/Users/fil/Source/observable/backup/api.js:100:29)
at processTicksAndRejections (internal/process/next_tick.js:81:5)
I noticed this too. I think the issue is that the Observable page now generates the “T” token client-side and posts that to the server, whereas the scripts have been getting this token from the cookie (?).
(I’ve been waiting for @mootari to share his repo so that we’ll have something nicer to build from than my crude edit )
function pu() {
const e = document.cookie.match(/(?:^|;)\s*T\s*=\s*([0-9a-f]{32})(?:$|;)/);
if (e)
return e[1];
const t = (n = 16,
Array.from(crypto.getRandomValues(new Uint8Array(n)), e=>e.toString(16).padStart(2, "0")).join(""));
var n;
const r = new Date(Date.now() + 1728e5);
return document.cookie = `T=${t}; Domain=.observablehq.com; Path=/; Secure; Expires=${r.toUTCString()}`,
t
}
So, so sorry about the delay. These last days I’ve been either too swamped or too tired to finish setting up the repo. I’ll set it up this weekend, promise.
The third clause is necessary since after authentication, the response from https://observablehq.com/loggedin has a set-cookie header which reads set-cookie: T=; Max-Age=0; Domain=.observablehq.com; HttpOnly; Path=/; Secure.
I noticed that too, but wrongly dismissed it as irrelevant. I’d add the check against value to getToken() though, otherwise code might fetch a token cookie with an empty value. I’ve updated my gist accordingly.
Don’t worry—I just wanted to check if it was just broken for me, but I don’t really need it for the moment. I still would like to be able to bulk download easily for backup and grep
I dunno if I should necro these threads but it makes sense to have only one backup thread IMHO. My backup solution exports to storage, ordered by update timestamp and checks version ids so it can stop early if nothing has changed.
I’ll probably automate it with cron once I gat some confidence with it.
I am not really planning on making this a service, but its pretty easy to copy if people want.
I did not like the previous approach in the end, it requires too much manual triggering and it was hard to setup in the first place, plus the end result was a tar archive that was hard to interact with. So based on the learnings from the previous one, I have done a fresh approach.
This new backup solution triggers a Github Action to unpacks the tar code, syncs with Github, and runs after every publish automatically, it also works with non-public team notebooks! You can point everything at a common repository, because the notebooks are unpacked into a directory mirroring their URL, you can take a look here. It only took me 270 Github action attempts before I got it!
The intent of the github backups notebook is that you could set it up once in a personal backup notebook that can then be transitively imported everywhere you need backups and avoid having to configure the github token each time.