Making your notebook citable - Opinions

chrispahm · May 18, 2021, 3:50pm

Hey everyone

For a research article I’m currently working on, I made a small, accompanying Observable notebook. Now that I want to cite the notebook in the article, I’ve been looking around for best practices in the literature:
So far, I’ve found a few Observable notebooks being cited in Nature articles (e.g by Sergei Pond et al.). In these articles, the notebook URLs were simply pasted in the articles body (or footnotes):

However, my co-authors are not (yet) convinced of the “URL approach”. Their main argument is that due to potential link-rot the notebook might become unreachable over time.

As a possible solution, I’ve been thinking about potential ways to assign a DOI number to a specific version of a notebook. One approach would be to Export → Download code, and upload the resulting page to Zenodo. For the above notebook, this results in the following DOI: 10.5281/zenodo.4770203.

The approach is similar to the way GitHub recommends to make repositories citable.
A major drawback of this approach is that the first thing you see when following the DOI hyperlink is the ZENODO repo, and not the actual Observable notebook.

In an ideal scenario, I guess I would like to have a DOI that points to a specific notebook version, and if not available (404), falls back to something like the ZENODO repo. Unfortunately, I’m relatively sure that’s not possible using ZENODO (for what I know)…

What do you think? Have you been citing/adding notebook to you research articles yet? If so, how did you do it? And in general, what do you think about having DOIs in notebooks?

Really curious about your opinions!

mootari · May 18, 2021, 7:37pm

Not really my field of expertise, but you could reference the archive by the notebook ID. I’ve prepped a demo here:

Edit: You can even archive the download bundle via the wayback machine:
http://web.archive.org/web/20210518200719/https://api.observablehq.com/d/ce99d79090f75f49@803.tgz?v=3

chrispahm · May 18, 2021, 8:57pm

Thanks @mootari, your notebook is a great help!
Also using the wayback machine is a clever idea to permanently save the notebook!

Compared to the Zenodo approach, the wayback machine seems to be a little less convenient from a user perspective though, as the download is started immediately and I’d guess a lot of users wouldn’t know what to do with the resulting .tgz archive. But still it’s a cool idea!

mootari · May 18, 2021, 9:14pm

I’m not sure that having to download files one-by-one is any better though. At the very least you’ll probably want to ensure that the only file offered for download is the archive file.

If you’re set on Zenodo, I would probably look into creating an integration. However, keep in mind that Observable’s default code license only grants reuse and modification within the platform, and different licenses may have different requirements for redistribution.

chrispahm · May 18, 2021, 10:09pm

That’s a totally valid point! And given the main “issue” of preventing link-rot it’s definitely a clean solution (and probably cleaner than the Zenodo one)!

The longer I think about it though, I guess just sticking to the regular notebook URL (with a version pinned) seems to be the most convenient method from a readers perspective. As long as the Observable platform exists, and the notebooks URL doesn’t change for any other reasons, people following the link just get the intended content. For instance, in the Nature paper cited above, you can click the link, quickly browse through the notebook, and then switch back to the article without any hassle.

One possible compromise could be to have the notebooks URL in the body of the article, with a footnote specifying the archive (wayback machine) URL in case the link has turned bad

In any way, the archive requires quite some technical knowledge in order to be displayed, and even if one has the required knowledge, it still takes time to get it to work. That’s definitely cumbersome if you (the reader) just wanted to quickly check something in the notebook only to find out you need x minutes to get it to work.
I think especially that is the beauty of using Observable in research articles, that one could just quickly switch context (from paper to notebook and vice versa) and get their hands on a dataset / interactive visualisation quickly to get a better feel for it!

mootari · May 18, 2021, 10:23pm

You could still reference the notebook via its internal ID, so that it always redirects to the current URL (should the author decide to reset existing slugs).

Observable pages saved by the Wayback machine are blank

opened 10:27PM - 27 Feb 21 UTC

jrus

**Description:** Because Observable is entirely reliant on client-side Javascrip…t, every notebook page on observablehq.com fails to render in contexts where pages are cached by third party sites, including on the Wayback Machine. **Steps to Reproduce:** Navigate to any Wayback Machine cache of an Observable notebook, e.g. http://web.archive.org/web/20201110101204/https://observablehq.com/@jashkenas/inputs **Expected behavior:** Some version of the page should appear, at least containing the basic text output of html and markdown cells, but ideally containing some version that can fully function standalone with Javascript etc. included. **Actual results:** Blank gray page with no content. (for beta.observablehq.com links, the result instead was an error page) **Further discussion:** Notebooks saved by the Wayback Machine don't need to be editable, or support all of the features of the platform, but it would be great to make them legible in some form. The Wayback Machine is a completely indispensable tool for the web, preserving web history and making interlinks stay meaningful into the future, despite changing fortunes of web businesses and individual site managers. It is a shame that already several years of Observable notebook history has been excluded from that archive. If (heaven forfend) Observable the company and platform ever disappears, a saved copy in the Wayback machine would be invaluable. I'm sure it would be a nontrivial amount of work to serve some meaningful self-contained static version of every notebook to the Wayback Machine's crawlers, but it would be much appreciated by future readers.

kjgarza · July 28, 2021, 3:39pm

related to this topic, maybe it would be also important to consider the recent developments in Github and the CITATION.cff push https://twitter.com/natfriedman/status/1420122675813441540?s=20

somethingelseentire · July 29, 2021, 8:41am

I’ve worked with DOIs both as a scientist and as a developer in a university library and my conclusion is that DOIs in general are a horrible idea.

They introduce weird man in the middle attacks (if you want to censor a specific citation you just have to attack the servers of the particular DOIs issuer) and reliance on central authority, version drift errors (does the DOI actually point to the preprint? the final version? the revised version?), and miscatalog errors (the DOI points to the conference welcome and timetable flyer instead of the published proceedings) that are completely avoidable by content addressable IDs a.k.a. hashes.

The most insane thing I’ve ever seen my librarian colleagues do is a tutorial for turning Git SHA-1 commit hashes into DOIs. Where you take a cryptographically secure ID that is guaranteed to point to a specific piece of content and replace it with what is essentially a shortened URL.

Don’t get me wrong, Observables notebook slugs/IDs are just as bad if not worse.
But what we really need if we want to do science with observable, and I think proper citation, reproducability, and access in science in general, is expose hashes of the source code of our papers (.tex, .observablejs, …) as self describing content addressable urn, e.g. urn:blake2:9aec6806794561107e594b1f6a8a6b0c92a0cba9acf5e5e93cca06f781813b0b.

The cool thing about those is that you’re not only sure that you alway get the right paper/code/dataset for your citation, it also comes with a free publisher-less distribution service. Simply prepend the content URN with magnet:?xt= and throw it into a bittorrent client of your choice ;).
And if you want a proof of publication to mark that you are the first one to publish said work at a given time, simply take the biggest blockchain/crypto-currency and throw the hash in there as a comment/annotation/metadata on a transaction. Once the transaction completes and is buried, you have proof that the work was published it at the given time, no expensive open access fee needed.

Topic		Replies	Views
[rant]The current publishing/slug system is a PITA. Site Feedback	2	735	July 6, 2020
Better Organization with Custom URLs Announcements	3	630	October 31, 2020
Comments on notebooks	9	1468	January 15, 2020
Feature Request: Unlisted Notebooks Site Feedback	2	533	May 24, 2018
What's the protocol for nominating notebooks for a collection?	4	1014	July 2, 2018

Making your notebook citable - Opinions

Related Topics