D3.csv() and DuckDB caching remote files

yugin-pro · April 3, 2025, 8:14pm

Hi there! Can you suggest the optimal way to cache downloaded files using d3.csv() and DuckDB on the client-side browser? I use the Observable framework and Google Spreadsheet CSV files to create a marketing and analytics trends dashboard. Here is a code example:

const trendweek = await d3.csv('https://docs.google.com/spreadsheets/d/e/2PACX-1vStElZLXAh6_Zy1z6HqCc6xf6_xOVBJMOzVvNQGLOf6KmsVY07etajt3_gFplnqoCCOi17bBkib_Xv0/pub?gid=2033815436&single=true&output=csv')
const sql = await DuckDBClient.sql({trends: trendweek})

Every time I reload the page or open it in a new tab, it takes too much time to download all the CSV data, which can distract users and make the page heavily loaded

tomlarkworthy · April 4, 2025, 9:35pm

So event requesting just the metadata (28 bytes of information) takes nearly as long as the full download

(
  await fetch(
    "https://docs.google.com/spreadsheets/d/e/2PACX-1vStElZLXAh6_Zy1z6HqCc6xf6_xOVBJMOzVvNQGLOf6KmsVY07etajt3_gFplnqoCCOi17bBkib_Xv0/pub?gid=2033815436&single=true&output=csv",
    {
      method: "HEAD"
    }
  )
).headers.get("content-length")

So that API endpoint is slow. It doesn’t expose cache headers or a version that I can see (e.g. etag)

The Google Sheets API v4 supports gzip and probably is faster too as its designed for programmatic access.

tomlarkworthy · April 4, 2025, 9:42pm

Just checking my notes… this is what I have used in the past

faster = {
  const sheetId = "1Z7Dja43FepxVOJc5_pMdP0etERM6h0BPAWT74zjdbno";
  const sheetName = "Sheet1"; // Change if your sheet name is different
  const url = `https://docs.google.com/spreadsheets/d/${sheetId}/gviz/tq?tqx=out:csv&sheet=${sheetName}`;

  return d3.csv(url);
}

Which seems faster

yugin-pro · April 4, 2025, 10:04pm

It definitely works, thanks!
But it forces me to make the whole Google Document accessible via the internet, and I have some data in there I don’t want to share with others .
The second thing is that users still have to wait for the data to load after reloading the page or opening it in new tabs.
I asked ChatGPT, and it suggested using the browser’s storage API (like localStorage) to store data locally, but maybe there is another, more convenient way to achieve the same result using Observable functionality or another API?

tomlarkworthy · April 4, 2025, 10:32pm

Yes you can also use local storage to keep the assets in the browser, maybe revalidate in the background for 0 latency on page load. You can also downlaod the CSV and serve via framework which is better optimized. You can switch from CSV to compressed parquet (GitHub - hyparam/hyparquet: parquet file parser for javascript).

Also are you sure its the file that is the slow part? Coz often I find the 7MB duckdb dependancy is the actually cause of slow starts. Did you measure in the network tab of the browser.

yugin-pro · April 5, 2025, 10:15am

This is good idea to take a look at network mesurement, i ll get back with research

yugin-pro · April 6, 2025, 9:48pm

Okay, probably I am doing something wrong, but I set up slow 4G without cache to test weak network conditions in DevTools (Chrome, Linux OS).
Spreadsheet Request: Waiting and downloading from the spreadsheet takes 700 ms and 400 ms accordingly, which is not bad at all.

But then I saw this:

Request to: https://yugin-pro.nl/_npm/@duckdb/duckdb-wasm@1.29.0/dist/duckdb-eh.wasm Waiting and downloading takes 600 ms and 26,000 ms accordingly.
I have no idea why this is.

tomlarkworthy · April 8, 2025, 5:50am

Yeah that’s duckdb, it’s massive. Unless I need SQL I use hyparquet

Topic		Replies	Views
Is it faster to store a data array from the backend or fetch again with d3.json()/csv?? Community Help	2	358	October 28, 2022
Loading a duckdb database Community Help	7	503	March 23, 2024
Retrieving data from google drive / box Community Help	6	1414	February 19, 2020
Inputs.file & DuckDB Community Help	4	499	July 16, 2023
Use read_csv in duckdb call for a locale file Community Help	2	179	April 1, 2024

D3.csv() and DuckDB caching remote files

Related topics