I’ve been working for some time with a d3 project outside Observable. The project implements an economic model which consumes some file data and visualises it. My goal is to separate off the model and data part only, make it available as a module, and to import this to the Observable ecosphere to do analysis and visualisation (or make the model available to others).
The current architecture is somewhat spaghetti (though it’s better than it was) but I think can be packaged as a collection of a few modules. What I am looking for is advice, or suggestions for reading, or a pattern for implementing the main module.
In the current implementation, on load the project loads csv data (some population data), parses it and saves it as a nasty global object, which other functions use for calculations. My plan was to have a main module that has an initialise function [?] that takes the csv file name [?] and prepares the main data object as a member of this module, which can then be passed to other functions. Is this a typical pattern for doing things? I’m guessing there must be lots of other modules that start by consuming data and using it to initialise some sort of data table.
This is not a very good explanation, so any questions are welcome!
Hi — not sure! Could you post an example, even if it’s a small/fake piece of what you’re working on, that captures the pattern you’re describing? At a glance it sounds totally reasonable. Lots of notebooks have a cell with more “raw” data, then another cell that parses/cleans/filters it, and then a ton of other cells that reference that one.
So a toy example: suppose my module is really simple: it exports a function to return a single point from a dataset, which is read from a .csv (or whatever) once on initialisation. Perhaps the dataset is just something like a table of global population against year, and the module exports a single function get_population(year).
I hope this makes sense! Basically I would like importing modules to be unaware of the source of the dataset; they should just care that there is a function to give them a result.
In Observable you would fetch your data in a separate cell (let’s call it “data”), then have get_population() reference that cell. Authors would import get_population into their notebooks, but could optionally also import data if they desire so. Observable notebooks have no “internal” cells – any named cell can be imported.
In terms of execution the data cell would only run once, and only if any other cell references it directly or indirectly (via get_population(), e.g.). Note that Observable’s Runtime automatically resolves top-level (i.e., cell-level) promises. So if your “data” cell returns a fetch promise, the code defined in your “get_population” cell would only run once that promise has resolved.
However, this also means that the data gets loaded whenever a cell contains a reference to “get_population”, even if the function itself is never called. If this poses a problem, we can discuss strategies to work with large amounts of data.
I see. fs is not available in the browser, and since you already bundle your data with your module, I recommend to wrap the data as an object in an ES module, and have your get_population code import that module. This way you avoid any async fetching, but it requires you to put more effort into your package’s bundling step.
The preprocessing will likely be a tradeoff between the amount of data and the amount of computation that needs to be performed. If your data is already compact and/or preprocessing is fast, it’s reasonable to do it on the fly whenever your module gets imported. Otherwise you might want to consider “baking” the already preprocessed data into your module.
Does that answer your question?
Don’t sweat too much about compatibility in the beginning. There are several good bundling services available (e.g. skypack, unpkg) that can take almost any package format and spit out either UMD or ES modules that can be used in the browser.
As for requiring/importing packages in Observable notebooks, check out this guide:
Thanks so much for the amazing help: I think I am almost there and you have answered my question I think. Indeed I’m trying not to sweat the small stuff at this stage, but also don’t want to leave the whole packaging bit to the end and realise I’m stuck for want of having made some good choices earlier on! I appreciate a lot of this is basic JS toolchain stuff.
I think the best solution is for me to probably preprocess my data and bake it into a module (~~daft question, but is this a case of copying raw JSON into the source or is there a more optimised method?~~I think I can just require the .json file from my data module?). My raw CSV is about 4MB. So I will end up with a module (say D) for the data, imported by a module with functions (call it F) to act on it. Then I can require F in my notebook.
A final wrinkle; my tiny package uses Common JS, as I am trying to use Jest for testing and it seems this is easier. Is this going to work once I package and try to require?
I appreciate this is perhaps a rabbit hole you didn’t need to fall down, but it is extremely useful.
Ask as many questions as you want! I’m sure that many have wondered the same, and reading this thread will help them as well.
My experience with bundling is extremely limited as well, but I would think that manually preprocessing and copying your data is perfectly fine if you don’t expect it to change. Otherwise you may want to consider more automated means where you write the resulting JSON out into a module file as part of your publishing process.
You’d import your package via Skypack, which takes in many formats, including CJS, and spits out ES6 modules: