Tools for statistical analysis?

Observable is becoming a major tool in the field of Visual Analytics and cartography. D3js is amazing. Plot is amazing. Etc. But, as I see it, there are few javascript libraries for statistical data analysis (and spatial analysis). I have tested for example hclust.js, hcluster.js and ml-hclust to make agglomerative hierarchical clustering. First of all, it is not possible to import these libraries directly with require(). Moreover, it is very slow and unusable when the number of statistical units is too high. In my opinion, this lack of tools for statistical analysis is a real problem for Observable to become mainstream in this field. Question: does the Observable team plan to develop this kind of tools (e.g. hierarchical clustering, PCA, SOM, …) which are also associated with specific graphic and cartographic representations? Do you think this work has to be done independently of Observable? Or do you think it is not relevant? In my view, the availability of such tools within Observable is a necessity (especially for use in the academic community). What do you think?

2 Likes

Totally relevant! As you can tell from the many notebooks about these topics, we are definitely interested in bringing as many analytical tools as possible to Observable.

Re: hclust, I see that this notebook includes it from bundle.run: 27 - Clustermap / Theo Dedeken / Observable

The list could also include simple-statistics, reorder.js, various flavors of UMAP (like druid.js), etc.

We’ve been trying to support existing efforts, to demonstrate how they work, to develop some code for individual algorithms for which there was no implementation (e.g. sliced optimal transport), and to make existing libs consumable in notebooks.

However at this point in time I don’t think observable has plans to develop a statistics library that would be a one-stop shop.

If anything, I’d be interested in collaborating on a notebook that lists all the existing methods that are (more or less well) supported, and those that are missing, and could be used as a map both for people who want to do the analysis, and for developers who want to expand the possibilities.

1 Like

Ok. I opened a notebook shared with Observable ambassadors. Do you see it? Should I let it be private or should I make it public right now?

Hey @neocarto and @fil,

absolutely agree with everything you said!
We had a longer discussion about this (stats/data science in Observable in general) in this thread: Observable for research - (advanced) statistics

Since then, I’ve been trying to port a few R/python functions to JS, which are also based on mljs:
R’s lm, vif, View, summary, stargazer: Linear models in Observable notebooks / Christoph Pahmeyer / Observable
R’s cor, var(i), corrplot, plot, hist: Correlation, Variance And Covariance (Matrices) / Christoph Pahmeyer / Observable
Seaborns (python) pairplot: Plotting pairwise relationships in a dataset / Christoph Pahmeyer / Observable
**
Just as you said before, these lack a solid/fast WASM implementation in the background. So they will become slow on large datasets! Anyways, I’ill add those to the notebook later!

Especially for the summary (Summary Table / Observable / Observable), and obviously the plot functions (Shorthand / Observable Plot / Observable / Observable) Observable has much better alternatives though :blush:

3 Likes

I strongly agree that the lack of a good, clean, well-supported library of statistical functions is currently the single biggest weakness of Javascript and Observable for working with data.

As we know all too well, the human mind is easily deceived by a pretty picture. Don’t get me wrong, I love the elegant graphics that Observable facilitates, but responsible data analysis must also include statistical support for the conclusions being drawn.

This is just pie in the sky, but if the functionality, ease of use, and open source licensing of JASP (https://jasp-stats.org/) were available in Javascript for Observable, it would be a major advance for Javascript-based data science.

A while back I started this notebook which might contain some useful resources:

1 Like