For a research article I’m currently working on, I made a small, accompanying Observable notebook. Now that I want to cite the notebook in the article, I’ve been looking around for best practices in the literature:
So far, I’ve found a few Observable notebooks being cited in Nature articles (e.g by Sergei Pond et al.). In these articles, the notebook URLs were simply pasted in the articles body (or footnotes):
However, my co-authors are not (yet) convinced of the “URL approach”. Their main argument is that due to potential link-rot the notebook might become unreachable over time.
As a possible solution, I’ve been thinking about potential ways to assign a DOI number to a specific version of a notebook. One approach would be to Export → Download code, and upload the resulting page to Zenodo. For the above notebook, this results in the following DOI: 10.5281/zenodo.4770203.
The approach is similar to the way GitHub recommends to make repositories citable.
A major drawback of this approach is that the first thing you see when following the DOI hyperlink is the ZENODO repo, and not the actual Observable notebook.
In an ideal scenario, I guess I would like to have a DOI that points to a specific notebook version, and if not available (404), falls back to something like the ZENODO repo. Unfortunately, I’m relatively sure that’s not possible using ZENODO (for what I know)…
What do you think? Have you been citing/adding notebook to you research articles yet? If so, how did you do it? And in general, what do you think about having DOIs in notebooks?
Really curious about your opinions!
Not really my field of expertise, but you could reference the archive by the notebook ID. I’ve prepped a demo here:
Edit: You can even archive the download bundle via the wayback machine:
Thanks @mootari, your notebook is a great help!
Also using the wayback machine is a clever idea to permanently save the notebook!
Compared to the Zenodo approach, the wayback machine seems to be a little less convenient from a user perspective though, as the download is started immediately and I’d guess a lot of users wouldn’t know what to do with the resulting .tgz archive. But still it’s a cool idea!
I’m not sure that having to download files one-by-one is any better though. At the very least you’ll probably want to ensure that the only file offered for download is the archive file.
If you’re set on Zenodo, I would probably look into creating an integration. However, keep in mind that Observable’s default code license only grants reuse and modification within the platform, and different licenses may have different requirements for redistribution.
That’s a totally valid point! And given the main “issue” of preventing link-rot it’s definitely a clean solution (and probably cleaner than the Zenodo one)!
The longer I think about it though, I guess just sticking to the regular notebook URL (with a version pinned) seems to be the most convenient method from a readers perspective. As long as the Observable platform exists, and the notebooks URL doesn’t change for any other reasons, people following the link just get the intended content. For instance, in the Nature paper cited above, you can click the link, quickly browse through the notebook, and then switch back to the article without any hassle.
One possible compromise could be to have the notebooks URL in the body of the article, with a footnote specifying the archive (wayback machine) URL in case the link has turned bad
In any way, the archive requires quite some technical knowledge in order to be displayed, and even if one has the required knowledge, it still takes time to get it to work. That’s definitely cumbersome if you (the reader) just wanted to quickly check something in the notebook only to find out you need x minutes to get it to work.
I think especially that is the beauty of using Observable in research articles, that one could just quickly switch context (from paper to notebook and vice versa) and get their hands on a dataset / interactive visualisation quickly to get a better feel for it!
You could still reference the notebook via its internal ID, so that it always redirects to the current URL (should the author decide to reset existing slugs).
related to this topic, maybe it would be also important to consider the recent developments in Github and the CITATION.cff push https://twitter.com/natfriedman/status/1420122675813441540?s=20
I’ve worked with DOIs both as a scientist and as a developer in a university library and my conclusion is that DOIs in general are a horrible idea.
They introduce weird man in the middle attacks (if you want to censor a specific citation you just have to attack the servers of the particular DOIs issuer) and reliance on central authority, version drift errors (does the DOI actually point to the preprint? the final version? the revised version?), and miscatalog errors (the DOI points to the conference welcome and timetable flyer instead of the published proceedings) that are completely avoidable by content addressable IDs a.k.a. hashes.
The most insane thing I’ve ever seen my librarian colleagues do is a tutorial for turning Git SHA-1 commit hashes into DOIs. Where you take a cryptographically secure ID that is guaranteed to point to a specific piece of content and replace it with what is essentially a shortened URL.
Don’t get me wrong, Observables notebook slugs/IDs are just as bad if not worse.
But what we really need if we want to do science with observable, and I think proper citation, reproducability, and access in science in general, is expose hashes of the source code of our papers (.tex, .observablejs, …) as self describing content addressable urn, e.g.
The cool thing about those is that you’re not only sure that you alway get the right paper/code/dataset for your citation, it also comes with a free publisher-less distribution service. Simply prepend the content URN with
magnet:?xt= and throw it into a bittorrent client of your choice ;).
And if you want a proof of publication to mark that you are the first one to publish said work at a given time, simply take the biggest blockchain/crypto-currency and throw the hash in there as a comment/annotation/metadata on a transaction. Once the transaction completes and is buried, you have proof that the work was published it at the given time, no expensive open access fee needed.