Observable for research - (advanced) statistics

tl;dr: Here’s a notebook which presents a few approaches toward importing stdlib. I’ve used the erf/ervinv/erfc/erfcinv example— /cc @mcmcclur —since that sort of rigor is sort of the raison d’etre for stdlib and is why I evangelize here and there and keep chipping away at improving it:

Though stdlib has a long way to go. If there are things which would be particularly helpful to people, please let me know or ask for them on gitter.im/stdlib-js/stdlib.

Here are a couple variations for using stdlib on Observable:

@stdlib/stdlib

// Local/node usage only; not for Observable
stdlib = require('@stdlib/stdlib')
erf = require('@stdlib/math/base/special/erf')

This variation is designed for local/node usage so that you can target specific imports and write, e.g. var blas = require('@stdlib/blas') and then bundle it locally. The module includes the unbundled source so you can target specific pieces. It is not appropriate for CDN usage.

@stdlib/dist-tree

The dist modules include only the bundled source and so are a better fit for CDN usage. See: @stdlib/dist-tree. You can read more here. They exclude particularly large pieces like sample datasets.

The following is ~500kb gzipped and includes blas (which, at present, is mostly just a placeholder package for a future translation of blas to js):

stdlib = require('https://unpkg.com/@stdlib/dist-tree')
blas = stdlib.blas

@stdlib/esm

The @stdlib/esm module exposes stdlib rewritten as ES modules. WIP so although internals may change, usage should remain stable. In this module, each of stdlib’s modules is bundled into a single (minified) file, and intra-project dependencies are resolved as ES modules. There are sometimes a large number of intra-project dependencies, but I think a combination of multiplexed HTTP/2 requests and caching make it pretty acceptable for Observable. If you import very large pieces of stdlib (e.g. all of it), then the dist bundles may be a better fit. I’m almost happy with the result [1].

Finding a good CDN is challenging. Tip of the hat to @mootari for figuring out that jsdelivr.com definitely does the best job there. I worked pretty hard to vendor any external deps, bundle, minify, etc, so that all it needs is static file delivery. Thus, for a variety of related reasons, some CDNs (e.g. unpkg.com, skypack.dev, deno.land/x) either reject the size or try to repeat some of these preparation steps and really struggle.

The following work pretty well and allow importing of either blas (called a namespace package, which exposes as named exports all of its modules including ddot) or just ddot:

// Import blas:
blas = import("https://cdn.jsdelivr.net/npm/@stdlib/esm@0.0.3/blas.js")

// Import a single module from blas:
ddot = (await import("https://cdn.jsdelivr.net/npm/@stdlib/esm@0.0.3/ddot.js")).default

[1] sourcemaps are present but slightly broken and I’m not eager to use magic-string to fix that until I’m pretty confident it’s headed in the right direction as a whole. The single biggest challenge is that some CDNs (rightfully) reject the size, even though you may only request a few bytes of files. We could trim this down, but not needing to trim it down and complicate things with separate bundles for separate use cases was certainly a large part of the motivation for ES modules to begin with. TBH I don’t know how to break the ES module distribution into parts since that would require ES module dependencies across those parts. C’est la vie. Suggestions welcome. :grinning_face_with_smiling_eyes:

4 Likes

@chrispahm To your specific question about stats, one of the core stdlib developers is currently a stats postdoc using/developing stdlib as part of his work, so it has a particular emphasis on stats. See: @stdlib/stats, @stdlib/stats/base, and @stdlib/math/base/special (*). Unfortunately, I’m not the one who can really speak about the structure and breadth of the stats functionality, but I do know that a lot of effort has been put into rigorous low-level math functions that power it (though linear algebra is not there yet). The above outlines how the functions can be imported, and gitter.im/stdlib-js/stdlib is a good place to ask about features/usage if you’re interested.

(*) “base” refers to core implementations that do the heavy lifting. As opposed to ongoing work which will wrap those in higher level APIs for which sin(a) might return a per-element sine if a is an ndarray or a complex number if the input were complex.

2 Likes

Thanks a lot Ricky! stdlib is definitely on my list soon.

@mcmcclur Don’t hesitate to offer thoughts/feedback/requests. If there’s something you’re interested in (ODEs, for example), there’s a good chance it’s not there but that it’s the goal for it to exist. :slight_smile: Knowing what’s particularly valuable/desired might also help kickstart implementation.

I’ve just come across this: ES2021, in particular, stating availability of WeakRef and FinalizationRegistry in ES2021. I was wondering if you guys have been having any further discussions or progress on plans for BLAS/LAPACK. I don’t work on this myself, but I’m trying to keep tabs on who might be, to get a sense for potential timelines for availability.