What options (if any) are there for programmatically inspecting the cells in an Observable page and their contents? I would like to write JavaScript functions that can, for example:
Determine if a (markdown) cell exists on a page containing the word XXX
Determine if a (markdown) cell exists on a page that starts with the heading # YYY (or if inspecting rendered content, the first line is YYY)
Determine the word count of a (markdown) cell identified as above.
And more generally, be able to reference a markdown cell programmatically and extract/analyse its contents.
My immediate use cases would involve inspecting markdown cells, but if were possible to do the same with HTML and latex cells to, even better.
While I am comfortable working with Observable, I have less experience of working with the DOM, so donāt know if any of the above are trivial or impossible.
You canāt inspect cells, only the rendered contents. As such you wonāt be able to tell a Markdown cell from a JavaScript cell without actively marking the cell output.
I would recommend to name your cells and refer to them by name. Then you can simply search a cell outputās .textContent property.
To determine whether a heading is the first element you can query for e.g. via ā& > h2:first-childā.
I am exploring the possibility of creating narrative schemas for observable notebooks. That is, validating a notebook against a set of user-defined rules that, for example, require the notebook to contain certain headings, or that a cell contents have a minimum or maximum word count.
It may be that the way forward is to do this via named cells and, as you suggest, use .textContent but am open to other approaches if easier for the document author.
I think you can do most of this stuff with DOM mutation observers. For example, thereās a table of contents component I wrote:
The D3 gallery also does some kind of interesting stuff to accumulate the gallery contents so we can show the number of entries (see the links cell in the appendix):
Another fun example:
Thereās another technique which is even more obscure which is to listen to the events that Observable sends to the worker iframe to evaluate your notebook. That gives you not just the output (rendered content) or each cell, but the (compiled) source code which you could statically analyze. That would be a pretty advanced approach, however, and not something we would consider a public API. So mutation observers is probably the way to go. Fabian points out we no longer allow this technique.