A chord diagram of concepts and relationships extracted from 12 lectures of graduate stochastic analysis. 96 concepts, 534 relationships — no predefined ontology, no labeled training data. Structure emerges from the text.
Why this question
Can you recover the conceptual structure of a mathematical field without knowing what field you’re reading? The pipeline doesn’t know it’s processing stochastic analysis — it sees text, runs repeated extractions, and scores concepts and relationships by how consistently they appear across runs. Confidence is frequency, not a classifier output. This is early work; the current corpus is one course. The next step is a second domain to test whether the approach surfaces genuine cross-domain connections.
Methodology
Each lecture is processed with n=10 extraction runs. Concept confidence is the fraction of runs that mentioned the concept cluster, borrowed from semantic entropy (Farquhar et al. 2024). Synonym clustering uses tiered entailment: two small models first, a larger tiebreaker for disagreements. A second-pass semantic merge recovers concepts fragmented across naming variants — “infinitesimal generator”, “generator (diffusion)”, and “generator” collapse to a single node.
Implementation
The pipeline is written in Julia — LLM calls via Ollama running locally, GraphML export, per-lecture reports merged in a post-processing step. Observable is a pure downstream consumer: all the heavy work is pre-computed offline, the notebook reads a static file.
The most satisfying thing for me (and the most immediately accessible as an outsider) is hovering over an edge between two concepts. That’s really nice and general and exciting. But it’s a little hard to hit the edge with my mouse, and it disappears as soon as I mouse away, so my favorite info in the chart feels the most buried. Some ideas:
Add stroke to the lines (white or invisible) to give them bigger hit targets, though you’d have ambiguous collisions.
Sample points along the edge curves and use a Voronoi overlay to get the curve nearest the mouse.
Click a concept once to pin it as the “start” concept; then, as you hover over other concepts around the perimeter, use them as the “end” concept and show the relationship text.
Maybe you could even put text on a curve between concepts. So like, I click “Lipschitz condition”, and then the path to “existence and uniqueness theorem” reads “is a foundational dependency of…”, and the path to “Ornstein-Uhlenbeck process” reads “is a generalization of…”, and so on.
Try representations other than the chord diagram. It’s lovely, but I really wanna read all the text!!, and so would also be happy to explore in a simpler matrix listing all concepts along top and side.