Can I create a table from parquet in duckdb with "DuckDBClient.of"?

I can manage to create a table and then populate it from a remote parquet file:

statsdb = DuckDBClient.of()
statsdb.query(`CREATE TABLE stats AS SELECT * FROM "https://llimllib.github.io/nba_data/players_2023.parquet"`)

Is it possible to create the stats table via DuckDBClient.of, or do I need to do this CREATE TABLE?

If it is not possible, consider this a feature request - if it is I’d love to know how!

With Observable’s client, DuckDBClient.of() is loading the duckdb library and WASM file on demand, and so it has become a Promise that you neeed to await:

yup, that’s what I’m doing already - I was trying to see if there was a way to do something like:

DuckDBClient.of({
  sometable: "https://someurl/to/a/parquet.file"
})

You also can actually simplify that parquet_scan like I did in the sample at the top of this post

The notebook I used to hack on this is here: Figuring out how to use plot with duckdb / Bill Mill / Observable

Here is a way to get close by mimicking the File Attachments API. Just need to return a name and a url method in a helper function.

4 Likes

Very neat, thank you! I had trouble figuring out the interface for the duckdb constructor.

pulling Parquet files from GitHub seems to be OK.

Just a minor note: that’s only true of github.io, it doesn’t work straight from github; I added an example to my notebook to demonstrate

1 Like

It does, if you fetch the raw file:

https://raw.githubusercontent.com/llimllib/nba_data/main/data/players_2023.parquet

Helper:

// Return the raw file path for a GitHub path.
function ghFileOf(path) {
  const url = new URL(path);
  url.host = 'raw.githubusercontent.com';
  url.pathname = url.pathname.replace(/^\/blob\//, '/');
  return `${url}`;
}
1 Like