Roboco-op notebook coding assistent

I just completed some serious upgrades to a long running project I have had for notebooks. Its a coding assistant that reads all the cells in the notebook to build a LLM context, and answers questions by inserting cells. This has a few nice features: you can correct its code and get immediate feedback if it works or not, your conversation always stays in sync with the codebase, and you can just paste cells from other pre-trained notebooks to add in-context learned skills.

The latest upgrade was adding a function called highlight() that programatically adds it’s arguments into the LLM context, so now Roboco-op can use both source code cells AND runtime values information to reply.

Values into LLM context is a game changer. By feeding a test-suite report into its context it can do incremental test-driven development. It produces tested code!

2 Likes

Roboco-op improves within a session, once it has enough context to learn your programming style and Observable idioms. There is an annoying cold start problem though, on a fresh notebook when it doesn’t have enough context to program well.

So to address this and more, I added a RAG extension. I built an index from my notebook backups and created a vector similarity search at the cell level. You can play with just the index + search at Observable Notebook Compressed RAG Index + Vector Search / Tom Larkworthy | Observable, but the whole thing has been added by default to the Roboco-op starting notebook.

So now when you ask for a cell, it always has 4 extra examples cells to guide it a little. Hopefully, they are relevant examples, but even if not, they still help just by being concrete Observable programming examples.

Building the RAG index optimized for clientside was interesting. My first attempt weighed 14MB, but with PCA I golfed it down to 3.5MB, which is not bad for a 3000 cell knowlegebase.

The next version of Robocoop is here

Quite a significant overhaul since 1. It now understands Notebook 1.0 syntax properly. It can output multiple cells. It puts the entire current notebook into context, and optionally all dependents. I removed RAG but kept highlighting runtime values for test-driven-development. The prompt has been optimised by Trying out "GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning" / Tom Larkworthy | Observable and so it has a better default system prompt although that remains a work in progress (I need more evals really).

It is much easier to setup now, now you import just a single cell and then include that cell in your work notebook.

I am currently working on a potentially more powerful agentic-based coding assistant. But I find robocoop 2.0 still quite useful as its much faster and a good option than going to chatgpt that I end up using robocoop 2 by default. Robocoop 2.0 is useful for small pieces of code once the coding pattern is established, its more collaberative. The agentic approach can do much bigger things but is kinda slow to work with for the moment so I don’t use it as much, so I thought I would get Robocoop-2 working better while I work on the bigger upgrade.

Just got websearch tool as well, so now it can do some research before answering. Good for helping it know how to use APIs

A few extra context controls have shipped for robocoop-2. You can include a screenshot of the notebook for visual tasks, and select additional modules to include. This is quite useful for applying a pattern from one notebook to another, or giving the context to understand how to use an imported notebook. As it can output multiple cells it can one-shot some pretty big notebooks. Will run up to GPT 5.2 now. I use it a lot.

Turns out the additional modules + GPT 5.2 combo is crazy powerful. You can 1 shot whole notebooks now

I made this one like that. I pull in arc-diagram example and then remix it in my own domain.

How does it compare to the AI that is built into Observable Desktop, or using something like Claude Code to generate notebook files with the new Observable Notebook Kit file format?

1. A difference is Claude Code and Observable Desktop do not have access to the runtime variables

Given
ďżźconst myrandom = Math.random()ďżź

User: What is the value of Random?

Observable Desktop: Looking at the selected cell (id=11), I can see that myrandom displays the value that was generated by Math.random() in cell 10. However, I can’t see the actual runtime value from the code alone - Math.random()generates a different random number between 0 and 1 each time the cell runs.

The value of myrandom is whatever random number was generated when cell 10 executed, which would be a decimal number between 0 (inclusive) and 1 (exclusive). To see the current value, you’d need to look at the output of cell 11 in your notebook.
```

Robocoop-2: 0.4938541003720692

tom: Robocoop-2 uses the inspector to expand values up to a size limit SummarizeJS / Tom Larkworthy | Observable, so it sees what a user would when they click expand; under the observation you already put a lot of thought into how to represent Javascript values textually (which I later realized is somewhat like the Chrome debugging idioms).

  1. Robocoop-2 has access to imported module code.

Given:

import {samegroup} from "observable:@observablehq/plot-arc-diagram";
miserables

User: draw me an arc diagram of miserables links

Observable Desktop: (tom: the import doesn’t help)

Robocoop-2: (after adding plat-arc-diagram as an additional module, using gpt-5.2)

[link to experiment]( Robocoop Comparison / Tom Larkworthy | Observable )

Robocoop-2: (after adding plat-arc-diagram as an additional module, using gpt-5-mini)
tom: I did three tries. The first produced an SVG that was wrong. The second did the right plot. The third wrote non-observablejs code which when corrected (placing a JS script inside {} ) did the right thing). So GPT5.2 is definitely helping compared to a weaker model, but even gpt-mini can solve the task.

tom: Desktop cannot do the task because it doesn’t have the example in context. So to make things contextually fairer I pasted the arc-diagram example into Desktop. Given the example is already using the miserable dataset then question is already directly answered, yet the assistent chose the SVG method

So I change the prompt to explicitly do it using Plot

User: draw me an arc diagram of miserables links using Plot

Desktop:

tom: I have no idea why it is not using the answer it literally has spelled out on the top of the screen there. Its gone back to hallucinating arc

claude code (Opus 4.5):

tom: which lead to a good diagram, so Claude can leverage examples properly (but it can’t read runtime values because no notebook is executing).

So there are differences on what context is available and some difference based on the foundational model used. Claude code is amazing, its my main driver at work but I think there is alpha in having the LLM read runtime variables directly from the runtime which is the main reason I persist with Robocoop which I started in 2023. Robocoop-2 is a significant upgrade over the original because

  • it understands Observable syntax properly
  • its been through a prompt optimisation step (GEPA)
  • it has better default context (runtime values, websearch)
  • it has better optional context controls (additional dependancies, screenshots)

Each of those steps improved it a lot. Claude Code can be configured to do most of those things too with plugins, and it can do parallel tool calls and iterative workflows, but I like that Roboco-op lives in userspace with an MIT license and is naturally sandboxed by being in a webpage.

have you seen solve it (https://solve.it.com/)?
jeremy howard from fast.ai who was also pushing nbdev (https://nbdev.fast.ai/) before for iterative development is behind it.

i think they have correctly identified that interleaving feedback from running the code with writing the code is very powerful. this is true for both human coders and coding agents.

exactly this aspect seems to be tackled by Robocoop-2.

i have not been working on any notebooks lately (mostly app development), but definitely planning to check out this project once i return to observable next.

Agreed, and we’ve been planning on doing this for a while! I’ll keep you posted if we make progress.

2 Likes