Hi all, I’m working at the moment on a project where Observable notebooks are integrated into a bigger software as npm modules. The software passes data to the notebooks which visualise it. What we’ve been finding is that the software needs to run parts of the notebook code itself to get access to certain cell values across multiple instances of the notebooks. For example - what are the maximum values shown in each visualisation. So what I’m trying to do is set this up in such a way that does not require any rendering i.e. to optimise speed. My instinct is to use JSDOM. I’ve tried everything but cannot get past a ‘document is not defined error’. If you take a look at this repo and:
cd into observable-jest-playground\build
in the CLI run: node index.js
Here is the full error message I get:
return new Inspector(container.appendChild(document.createElement("div")));
^
ReferenceError: document is not defined
at file:///C:/Users/misha/ngviz/observable-jest-playground/node_modules/@observablehq/inspector/src/index.js:57:48
at define (file:///C:/Users/misha/ngviz/observable-jest-playground/node_modules/5ef68b6d8020c0c7/5ef68b6d8020c0c7@106.js:279:17)
at Runtime.runtime_module (file:///C:/Users/misha/ngviz/observable-jest-playground/node_modules/@observablehq/runtime/src/runtime.js:64:5)
at file:///C:/Users/misha/ngviz/observable-jest-playground/build/index.js:14:22
at ModuleJob.run (node:internal/modules/esm/module_job:194:25)
Can you say more about your project’s scope? For example, do you plan to support any notebook, or do they need to be written specifically for this environment?
I would like to support any notebook if possible, but what I’m particularly interested in is being able to run javascript cells and access their values through JSDOM (html and markdown cells are of secondary importance). I tend to use d3, arquero and Observable Plot libraries.
What are your expectations regarding security and isolation? Jest runs jsdom with the runScripts: "dangerously" option, which (from what I understand) means that they share the global context and can execute arbitrary code within the Node environment and with the process’ permissions.
So it would be preferable not to allow that, but if its a necessity to make this work then I don’t think it’s a deal-breaker. The JSDOM documentation states: “Again we emphasize to only use this when feeding jsdom code you know is safe. If you use it on arbitrary user-supplied code, or code from the Internet, you are effectively running untrusted Node.js code, and your machine could be compromised.” A security upgrade I am going to make in our project is to set the notebooks installed in it to “private” (and use API keys to install them as NPM modules), as a guardrail from the Observable end.
My recommendation would be to abandon the idea of a generalized solution and instead tailor both your notebooks and consuming node scripts to this use case. In detail:
Your notebooks should only ever use dynamic imports, never require().
Your node environment should pass a customized Library that replaces UMD requires with ES imports.
Your module instantiation should not pass an observer.
If you still need to shim via jsdom, it should not execute scripts.
I have no idea how far that will get you in practice, but I believe that this way you’ll end up with a more light-weight and predictable implementation than running everything through a vm context (or worse, by sharing globals).
And if you find the Runtime API challenging for more complex tasks, you may also want to take a look at this abstraction:
Feels to me like it will be much more natural to do this in Observable Framework since it uses vanilla JavaScript. There’s even support for exporting modules for easy embedding in other applications coming in the next release.