Inspecting cell content in a page

What options (if any) are there for programmatically inspecting the cells in an Observable page and their contents? I would like to write JavaScript functions that can, for example:

  • Determine if a (markdown) cell exists on a page containing the word XXX
  • Determine if a (markdown) cell exists on a page that starts with the heading # YYY (or if inspecting rendered content, the first line is YYY)
  • Determine the word count of a (markdown) cell identified as above.
  • And more generally, be able to reference a markdown cell programmatically and extract/analyse its contents.

My immediate use cases would involve inspecting markdown cells, but if were possible to do the same with HTML and latex cells to, even better.

While I am comfortable working with Observable, I have less experience of working with the DOM, so don’t know if any of the above are trivial or impossible.

You can’t inspect cells, only the rendered contents. As such you won’t be able to tell a Markdown cell from a JavaScript cell without actively marking the cell output.

I would recommend to name your cells and refer to them by name. Then you can simply search a cell output’s .textContent property.

To determine whether a heading is the first element you can query for e.g. via “& > h2:first-child”.

There are other alternatives, like parsing a public notebook’s source from its compiled module or fetching the actual source via a CORS proxy.

It’s difficult to make any recommendations without having more details about the why though.

1 Like

Thanks for those pointers.

I am exploring the possibility of creating narrative schemas for observable notebooks. That is, validating a notebook against a set of user-defined rules that, for example, require the notebook to contain certain headings, or that a cell contents have a minimum or maximum word count.

It may be that the way forward is to do this via named cells and, as you suggest, use .textContent but am open to other approaches if easier for the document author.

I think you can do most of this stuff with DOM mutation observers. For example, there’s a table of contents component I wrote:

The D3 gallery also does some kind of interesting stuff to accumulate the gallery contents so we can show the number of entries (see the links cell in the appendix):

Another fun example:

There’s another technique which is even more obscure which is to listen to the events that Observable sends to the worker iframe to evaluate your notebook. That gives you not just the output (rendered content) or each cell, but the (compiled) source code which you could statically analyze. That would be a pretty advanced approach, however, and not something we would consider a public API. So mutation observers is probably the way to go. Fabian points out we no longer allow this technique.

1 Like