word2vec

Sometimes I want to analyse text but its kind of a pain. I thought word2vec would be useful but its not so easy to find online… so I created this word2Vec / Endpoint Services / Observable its the full 3GB 2.7M word word2vec accessible via API so the notebook can sample from it without having to host it.

4 Likes

This is cool thanks for sharing!! I want to try to combine this with on the fly clustering and heatmap visualization from this notebook Python (Pyodide) on Observable running Clustergrammer2

2 Likes

yeah I saw that! It would go neatly together I think too.

@mootari also found a quite nice off-the-shelf clustering + word2vec application https://wikipedia2vec.github.io/demo/ which provides some intuition on how the words should cluster.

an issue with the google word2vec is its full of garbage so it’s better to start with a clean corpus of words, lookup their vectors then cluster.

3 Likes

For a fun application of word embedding vectors, I implemented an AI assistant for the game CodeNames in this notebook Cipher Words / David Kirkby / Observable

(this is using 100-dimensional GloVe vectors pre-trained on Wikipedia).