Inspecting cell content in a page

jwoLondon · June 16, 2023, 9:14am

What options (if any) are there for programmatically inspecting the cells in an Observable page and their contents? I would like to write JavaScript functions that can, for example:

Determine if a (markdown) cell exists on a page containing the word XXX
Determine if a (markdown) cell exists on a page that starts with the heading # YYY (or if inspecting rendered content, the first line is YYY)
Determine the word count of a (markdown) cell identified as above.
And more generally, be able to reference a markdown cell programmatically and extract/analyse its contents.

My immediate use cases would involve inspecting markdown cells, but if were possible to do the same with HTML and latex cells to, even better.

While I am comfortable working with Observable, I have less experience of working with the DOM, so don’t know if any of the above are trivial or impossible.

mootari · June 16, 2023, 12:32pm

You can’t inspect cells, only the rendered contents. As such you won’t be able to tell a Markdown cell from a JavaScript cell without actively marking the cell output.

I would recommend to name your cells and refer to them by name. Then you can simply search a cell output’s .textContent property.

To determine whether a heading is the first element you can query for e.g. via “& > h2:first-child”.

There are other alternatives, like parsing a public notebook’s source from its compiled module or fetching the actual source via a CORS proxy.

It’s difficult to make any recommendations without having more details about the why though.

jwoLondon · June 16, 2023, 1:58pm

Thanks for those pointers.

I am exploring the possibility of creating narrative schemas for observable notebooks. That is, validating a notebook against a set of user-defined rules that, for example, require the notebook to contain certain headings, or that a cell contents have a minimum or maximum word count.

It may be that the way forward is to do this via named cells and, as you suggest, use .textContent but am open to other approaches if easier for the document author.

mbostock · June 16, 2023, 2:01pm

I think you can do most of this stuff with DOM mutation observers. For example, there’s a table of contents component I wrote:

The D3 gallery also does some kind of interesting stuff to accumulate the gallery contents so we can show the number of entries (see the links cell in the appendix):

Another fun example:

There’s another technique which is even more obscure which is to listen to the events that Observable sends to the worker iframe to evaluate your notebook. That gives you not just the output (rendered content) or each cell, but the (compiled) source code which you could statically analyze. That would be a pretty advanced approach, however, and not something we would consider a public API. So mutation observers is probably the way to go. Fabian points out we no longer allow this technique.

Topic		Replies	Views
Would like a text representation of presentation cells Feedback	6	529	June 10, 2021
How to hide markdown code cells Community Help	2	1887	July 25, 2020
Name a markdown cell so it can be selectively embedded Community Help	3	716	April 3, 2021
Embed Showing Observable Cells? Community Help	4	825	January 16, 2020
Embed DOM cell into markdown Community Help	1	688	September 24, 2018

Inspecting cell content in a page

Related topics