Struggling to get notebook cell values using JSDOM

mishatsvelik · September 11, 2024, 11:11pm

Hi all, I’m working at the moment on a project where Observable notebooks are integrated into a bigger software as npm modules. The software passes data to the notebooks which visualise it. What we’ve been finding is that the software needs to run parts of the notebook code itself to get access to certain cell values across multiple instances of the notebooks. For example - what are the maximum values shown in each visualisation. So what I’m trying to do is set this up in such a way that does not require any rendering i.e. to optimise speed. My instinct is to use JSDOM. I’ve tried everything but cannot get past a ‘document is not defined error’. If you take a look at this repo and:

cd into observable-jest-playground\build
in the CLI run: node index.js
Here is the full error message I get:

return new Inspector(container.appendChild(document.createElement("div")));
                                               ^

ReferenceError: document is not defined
    at file:///C:/Users/misha/ngviz/observable-jest-playground/node_modules/@observablehq/inspector/src/index.js:57:48
    at define (file:///C:/Users/misha/ngviz/observable-jest-playground/node_modules/5ef68b6d8020c0c7/5ef68b6d8020c0c7@106.js:279:17)
    at Runtime.runtime_module (file:///C:/Users/misha/ngviz/observable-jest-playground/node_modules/@observablehq/runtime/src/runtime.js:64:5)
    at file:///C:/Users/misha/ngviz/observable-jest-playground/build/index.js:14:22
    at ModuleJob.run (node:internal/modules/esm/module_job:194:25)

Any ideas?

mishatsvelik · September 11, 2024, 11:13pm

PS note - contrast this with the files in the tests folder which to return cell values (tests are run using Jest).

mootari · September 13, 2024, 11:48am

Can you say more about your project’s scope? For example, do you plan to support any notebook, or do they need to be written specifically for this environment?

mishatsvelik · September 13, 2024, 11:53am

I would like to support any notebook if possible, but what I’m particularly interested in is being able to run javascript cells and access their values through JSDOM (html and markdown cells are of secondary importance). I tend to use d3, arquero and Observable Plot libraries.

mootari · September 15, 2024, 10:38am

What are your expectations regarding security and isolation? Jest runs jsdom with the runScripts: "dangerously" option, which (from what I understand) means that they share the global context and can execute arbitrary code within the Node environment and with the process’ permissions.

mishatsvelik · September 15, 2024, 5:33pm

So it would be preferable not to allow that, but if its a necessity to make this work then I don’t think it’s a deal-breaker. The JSDOM documentation states: “Again we emphasize to only use this when feeding jsdom code you know is safe. If you use it on arbitrary user-supplied code, or code from the Internet, you are effectively running untrusted Node.js code, and your machine could be compromised.” A security upgrade I am going to make in our project is to set the notebooks installed in it to “private” (and use API keys to install them as NPM modules), as a guardrail from the Observable end.

mishatsvelik · September 25, 2024, 4:23pm

Anyone know if this is possible? Or perhaps if it isn’t converting to Framework is the only way forward?

mootari · September 28, 2024, 6:43pm

My recommendation would be to abandon the idea of a generalized solution and instead tailor both your notebooks and consuming node scripts to this use case. In detail:

Your notebooks should only ever use dynamic imports, never require().
Your node environment should pass a customized Library that replaces UMD requires with ES imports.
Your module instantiation should not pass an observer.
If you still need to shim via jsdom, it should not execute scripts.

I have no idea how far that will get you in practice, but I believe that this way you’ll end up with a more light-weight and predictable implementation than running everything through a vm context (or worse, by sharing globals).

And if you find the Runtime API challenging for more complex tasks, you may also want to take a look at this abstraction:

mbostock · September 28, 2024, 8:02pm

Feels to me like it will be much more natural to do this in Observable Framework since it uses vanilla JavaScript. There’s even support for exporting modules for easy embedding in other applications coming in the next release.

https://observablehq.com/framework/embeds

mishatsvelik · October 2, 2024, 6:00pm

Thank you for the advice, I shall explore this direction upon the next release.

mbostock · October 2, 2024, 7:25pm

It was released yesterday! Let us know how it goes. Releases · observablehq/framework · GitHub

Topic		Replies	Views
From observablehq cell to NPM Community Help	9	1580	November 12, 2018
Can't embed live version of notebook	2	449	September 19, 2019
Embed Showing Observable Cells? Community Help	4	825	January 16, 2020
Unit Testing with Jest fails when testing @observablehq/runtime cells with library dependencies Community Help	3	115	September 9, 2024
embedding Leaflet map turns wonky \| requesting help to diagnose Community Help	3	1674	January 9, 2020

Struggling to get notebook cell values using JSDOM

Related topics