Concept graph of a graduate stochastic analysis course — extracted without supervision

lavernea · May 20, 2026, 10:03am

A chord diagram of concepts and relationships extracted from 12 lectures of graduate stochastic analysis. 96 concepts, 534 relationships — no predefined ontology, no labeled training data. Structure emerges from the text.

Why this question
Can you recover the conceptual structure of a mathematical field without knowing what field you’re reading? The pipeline doesn’t know it’s processing stochastic analysis — it sees text, runs repeated extractions, and scores concepts and relationships by how consistently they appear across runs. Confidence is frequency, not a classifier output. This is early work; the current corpus is one course. The next step is a second domain to test whether the approach surfaces genuine cross-domain connections.

Methodology
Each lecture is processed with n=10 extraction runs. Concept confidence is the fraction of runs that mentioned the concept cluster, borrowed from semantic entropy (Farquhar et al. 2024). Synonym clustering uses tiered entailment: two small models first, a larger tiebreaker for disagreements. A second-pass semantic merge recovers concepts fragmented across naming variants — “infinitesimal generator”, “generator (diffusion)”, and “generator” collapse to a single node.

Implementation
The pipeline is written in Julia — LLM calls via Ollama running locally, GraphML export, per-lecture reports merged in a post-processing step. Observable is a pure downstream consumer: all the heavy work is pre-computed offline, the notebook reads a static file.

tophtucker · June 2, 2026, 4:10am

This is wild! Very cool.

The most satisfying thing for me (and the most immediately accessible as an outsider) is hovering over an edge between two concepts. That’s really nice and general and exciting. But it’s a little hard to hit the edge with my mouse, and it disappears as soon as I mouse away, so my favorite info in the chart feels the most buried. Some ideas:

Add stroke to the lines (white or invisible) to give them bigger hit targets, though you’d have ambiguous collisions.
Sample points along the edge curves and use a Voronoi overlay to get the curve nearest the mouse.
Click a concept once to pin it as the “start” concept; then, as you hover over other concepts around the perimeter, use them as the “end” concept and show the relationship text.
Maybe you could even put text on a curve between concepts. So like, I click “Lipschitz condition”, and then the path to “existence and uniqueness theorem” reads “is a foundational dependency of…”, and the path to “Ornstein-Uhlenbeck process” reads “is a generalization of…”, and so on.
Try representations other than the chord diagram. It’s lovely, but I really wanna read all the text!!, and so would also be happy to explore in a simpler matrix listing all concepts along top and side.

Thanks for sharing, lavernea!

Topic		Replies	Views
Rapid D3 visualizations with ChatGPT Show and tell	2	538	February 20, 2024
Add a new topic in the Explore page about Wikipedia and Wikidata Feedback	1	416	December 25, 2021
Sunburst and TreeMap displaying graph-like data Help	20	1690	September 8, 2020
would like to see more NLP on this site Feedback	2	573	September 7, 2018
[feature request] List notebooks that depend on a given notebook Feedback	11	1092	December 23, 2021

Concept graph of a graduate stochastic analysis course — extracted without supervision

Related topics