Framework: How to buffer consecutive data

AlainRo · September 3, 2024, 10:50am

Hello !
Having a simple app getting a new set of data each day, Great ! I would like to get the same app able to memorize let’s say 2-5 days history in order to offer a better dashboard : offering trends analysis for instance.
I am thinking of having Github Actions doing this work. Something like :
1 - upload artifact history.json
2 - doing a npm ci of framework which performs a side effect of merging today data with history.json.
3- download artifact history.json. Exact pathname of history is an issue

I have very low experience with Github Action. Do you think I can succeed ? Are there better options from your expertise ?

Thanks

Alain

mootari · September 3, 2024, 11:42am

You can use the official GitHub cache action to persist data between workflow runs:

Alternatively you could aggregate your data in an external database and pass the credentials via GitHub action secrets. if you don’t want to manage additional cost then this list of free database hosting might help.

AlainRo · September 3, 2024, 12:54pm

Thank you mootari,

actions/cache seems to work fine, but I am struggling with the pathname correspondance of my history file. Do I need to use the src/data or the _file/data pathname ? To write ? To read ?It looks like framework generates a random key (history.XXXXXXX.json) to save the file. I am trying to have a flexible code to manage every possible configuration, but so far no luck. Any advice ?
External database would be my plan B. Thank you for the action secrets tip.

Alain

mootari · September 3, 2024, 1:01pm

Where is the data in your history.json coming from? Are you already generating it via a dataloader? If so, I would recommend to have your dataloader write to and read from its own cache (that you persist and restore via the action), and simply generate a new history.json on every build.

AlainRo · September 3, 2024, 1:08pm

Exactly,
My dataloader is doing : 1 -read history (readSyncFile), 2- read fresh data(fetch), 3-aggregate to history, 4- write history (writeSyncFile), 5- provides history to the app (standard output).
What do you mean by my own cache ? A ./cache directory outside of src/data?
Will this simplify my pathname variation problem ?

Thanks

Alain

mootari · September 3, 2024, 1:21pm

From your description it seems that the only thing you’d need to cache is the path of your sync file (assuming it’s not versioned). The steps in your GitHub workflow would be:

run actions/checkout
run actions/cache with the path to your history cache (i.e. your sync file) - can be anywhere, but should be outside the dist directory so it doesn’t get removed
run the build
deploy/upload the build

(the action should automatically handle cache updates)

AlainRo · September 6, 2024, 1:21pm

After many tries, one solution is :
1- In the dataloader : aggregate the data into a file at the root level : ./history.json (don’t try to put it into src or _file)
2- In Github Actions : cache the exact same file : history.json

But the history.json file at the root level cannot be downloaded any longer as a in a .md for instance.
Never mind there is a dynamic content solution How to download a file from framework ?

mootari · September 7, 2024, 3:55pm

Are you referencing the file as FileAttachment in any of your app’s pages? Your data loader should only output the file contents, not write the file itself.

If your data loader is called e.g. /data/history.json.js and you reference a FileAttachment(`/data/history.json`).json(), then Framework will automatically run the loader.

AlainRo · September 7, 2024, 4:04pm

No FileAttachment since my read/write are in my dataloader. I understood that FileAttachment is not available there. Only fixed URL and Read/Write FileSync.
In my .md files only FileAttachment(‘/src/data/current.json’) Coming from standard output.

mootari · September 7, 2024, 4:17pm

Can you expand on why you’d want to download the cache file?

AlainRo · September 8, 2024, 1:27pm

Yes. I would like to keep trace of the ‘5 latest dataset values’ at a given date. Why not ?
It is very confusing that framework renames files in a random way : file.XXXX.txt

I would be very glad to have a clear explanation about what happens to file names after/during deploy.

Mootari, many thanks for your patience.

mootari · September 9, 2024, 12:36am

Your dataloader will end up producing two files, but in two very different ways and with very different purposes:

The cached file that it only reads and updates internally. This file is not meant to be exposed or served.
The rendered file attachment that it produces by simply outputting the JSON.

I’ve put up an example that demonstrates a dataloader with its own actions cache: GitHub - mootari/dataloader-cache-example

On every build the dataloader fetches a new entry and adds it to the cached file. It then outputs the five last entries, which the app page in turn then renders.

You can see the output here: Dataloader Cache

AlainRo · September 9, 2024, 11:04am

The example is great ! Should be put with the others from my point of view. Thank you very much.

What do you mean by “…leaving a trail of dead caches. Acceptable for small datasets, but possibly devastating for larger ones.” ? Github Actions will fail after a while ? I hope not.

Alain

mootari · September 9, 2024, 11:16am

GitHub applies two limits to Action caches:

It will remove caches that haven’t been accessed in the last 7 days.
If the repository’s overall size of caches exceeds 10GB GitHub will start to remove the oldest cache.

In practice you’ll likely have a hard time hitting that limit unless you update the cache very frequently and/or store very large files. I also plan to look into removing the old cache entry so that this would no longer be a problem.

mootari · September 9, 2024, 2:18pm

I’ve updated the workflow file to automatically remove the old cache entry, and to handle the absence of a cache as well as workflow reruns gracefully:

github.com

mootari/dataloader-cache-example/blob/main/.github/workflows/deploy.yml

name: Deploy

on:
  # Run this workflow whenever a new commit is pushed to main.
  push: {branches: [main]}
  # Run this workflow once per day, at 10:15 UTC
  schedule: [{cron: "15 10 * * *"}]
  # Run this workflow when triggered manually in GitHub’s UI.
  workflow_dispatch: {}

permissions:
  actions: write

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - uses: actions/cache/restore@v4

This file has been truncated. show original

mootari · September 9, 2024, 2:28pm

May I also suggest to update the selected solution to point to the example?

Topic		Replies	Views
Github Actions with pandas data loader on Observable Framework New To Observable	3	210	February 18, 2025
Data Loader dependent of other Data Loaders Community Help	4	133	March 21, 2025
Persistence of data? Community Help	2	902	March 5, 2018
Github Actions Show and Tell	1	480	December 27, 2021
Copy a new data file to a build of Observable Framework Community Help	6	201	May 1, 2024

Framework: How to buffer consecutive data

Related topics