Hello! First off thank you for making ObservableHQ, it is absolutely amazing!
I am in the situation where I have created a notebook and made it public, but I have now been informed that it contains information that should not be public.
Once published, a notebook can be imported by other people — and their code can start to depend on yours — something that we don’t want to break.
However, this logic seems odd to me for two reasons:
ObservableHQ must give users the ability to remove their public data when they wish, and indeed I can do this by deleting the notebook, making a breaking change to the notebook, or even deleting my account. So putting a time limit on unsharing doesn’t remove the problem of code breaking, it just gives a poorer UX for the user who created the notebook. The stability contract is between the user who created the notebook and the user using it.
Whilst it is a core feature of ObservableHQ, the vast majority of notebooks are not imported by other notebooks.
I could see a case for a popup “Are you sure? Other notebooks may depend on this notebook”, but preventing unsharing does not seem ideal to me.
This might also be a good opportunity to patch out the flaw that allows you to keep a notebook at the front of the “recent” list by constantly unpublishing and publishing it within the 24h window.
You’re right. We wanted to make unpublishing more deliberate, and more difficult — to avoid breaking the social contract between you and anyone who might depend on your notebooks.
But as you point out, of course, that contract necessarily needs to contain a large amount of wiggle room.
There’s an additional caveat here, that in the future, we’d like to make it so that unpublishing a notebook only removes the ability to access it directly, but that imports of previously published versions continue to work.
So, with that in mind, and it being the case that you have a notebook here that contains information that shouldn’t be public, I would recommend:
Forking a private copy of the existing published notebook.
I am glad we’re talking about this; thank you @harry and @jashkenas While I am a huge fan of file attachments, I worry a bit that this complicates the issue of accidentally publishing private data.
A vignette and some reflection:
Prior to attachments, I was in the practice of linking data files from a CORS-enabled data store. In September, I left a job where I was referencing public data, and when I left and I closed the data store I was using for work, all my notebooks broke I wished at that moment that attachments had come out sooner, as then my notebooks would still work without me having to track back and now re-link all the data.
But the opposite case is now our (potential) concern: If I were to have uploaded some file as an attachment, and then later some supervisor came along and was bothered that I was sharing ‘sensitive’ information (even if it was already public in one or another format—a situation that got a friend of mine in deep trouble once), that’s it: the notebook may have already been forked (copying the attachment), and now these data are out in the wild. In the current context, this damage can be mitigated slightly with Jeremy’s solution: fork the public notebook and then trash it. But how would this look with persistent copies of the notebook available for imports? Would imports somehow be limited so that data in attachments aren’t exportable, but only functions that are written into Observable cells? If information is persistent, does this effectively mean that users cannot truly delete their data? How does that play out by geographical location of users[related]? And does it mean that all Observable notebooks remain ‘owned’ [cross-ref] by Observable?
@aaronkyle@jashkenas - thank you for the forking tip and responses. This has unblocked me on my problem and allowed me to keep my notebook, which is fantastic.
As for future design considerations. I still think there is value in empowering the notebook creator to have control over the data (file attachments and code alike) they have published. This is becoming the norm in, for example, social networks, where I can choose to revoke my public content at any time. While ObservableHQ generally deals with less sensitive information, is this not a requirement of laws like GDPR?
It could be integrated into ObservableHQ in a user-friendly way, where it is (a) clear to the user when a notebook is broken because it is depending on resources that no longer exist; and (b) when unpublishing a notebook it is made clear which (at least public) notebooks are dependent on it.
I would expect that removal of copyrighted data or data containing PII is so rarely required that offering tools would likely be a disproportional effort. You can always contact the Observable support to get your data removed.
Sure, but you can’t retroactively revoke the license under which your content got forked:
When you publish a notebook, you grant each User of Observable a nonexclusive, worldwide license to use, display, and perform Your Content through the Observable Service and to reproduce Your Content solely on Observable as permitted through Observable’s functionality (for example, through forking).
[…]
We will not delete Content that you have contributed to other Users’ notebooks or that other Users have forked from your notebooks.
Your published notebooks will no longer be accessible on the Observable website, but copies of published notebooks will be retained, so that any forks, imports and embeds that may depend on them will continue to function.
(@mbostock You may want to define “publish” in your terms, especially with regard to shared notebooks.)
Hey mootari, thanks for the response. I agree with your point regarding copyright data and PII. I was rather trying to say that in the modern spirit of users being able to remove things they have published, it would be user-friendly for ObservableHQ to also follow this spirit, even if removing a notebook does not remove all copies of it.
My use case in mind is that I’ve accidentally published code that I would no longer like to be associated with my public account, but the issue is not so severe that a formal data removal request is required. For example I think my code is low quality, outdated, no longer relevant, or I have received an informal complaint about it from a third-party.
If the notebook is not being used by any others then we could simply unpublish; or if it is used by others could we not still remove it from my list of public notebooks without breaking downstream notebooks? Essentially, formalizing the “fork-then-delete” workaround suggested earlier as an “unpublish” button?
I should also stress that I think this isn’t an urgent feature request from me, I just wanted to raise the issue as ObservableHQ continues to evolve.