how to use duckdb and DuckDBClient

undef · July 25, 2024, 10:04am

Hi,
I’m trying to access the duckdb instance created by DuckDBClient
(so I can use the registerFileXYZ apis from Data Ingestion – DuckDB)

I naively tried the below snippet which doesn’t work.(i’ve replace the js/sql blocks with triple` as it messes with the forum formatting)

triple`js --echo
const db = await DuckDBClient.of({
	lut: FileAttachment("data/lut.parquet")
})

const sql = db.sql.bind(db);
triple`

triple`sql
SHOW TABLES;
-- shows lut
triple`

triple`js
// DuckDB-Wasm is available by default as duckdb in Markdown
console.log(duckdb.PACKAGE_VERSION);
// 1.28.0 -- we have duck

// https://observablehq.com/@cmudig/duckdb-client
// db() returns the DuckDB database
const fdb = await DuckDBClient.db();
// TypeError: DuckDBClient.db is not a function

const c = await fdb.connect();
await c.query(`
SHOW TABLES;
`);
triple`

What is the right way of accessing the underlying APIs of duckdb , either via DuckDBClient or via duckb - I do not want to instantiate an entirely separate instance obviously.

thanks!

mootari · July 25, 2024, 11:11am

You can access the underlying DuckDB instance via ._db:

You can find the implementation here: framework/src/client/stdlib/duckdb.js at 260f8f1bd6d524d01d3dd82400293b318da1aa06 · observablehq/framework · GitHub

Markdown lets you wrap code blocks both in ``` and ~~~. To share your code verbatim, use ~~~.

undef · July 25, 2024, 11:41am

Hi and thanks for the fastreply,
but I’m sorry to say that’s a little too cryptic for me - could you maybe give an example of how I can still use the sql code blocks in markdown and the duckdb api.

I tried this, but no joy.

```js
const fdb = db._db;
//TypeError: fdb.query is not a function

await fdb.query(`
SHOW TABLES;
`);
```

mootari · July 25, 2024, 12:57pm

You skipped creating the connection in your second attempt. (Also keep in mind that you can still continue to use the DuckDBClient abstractions like db.sql`SHOW TABLES`.)

Can you say more about the additional types of files that you want to register? DuckDBClient.of({}) lets you pass in multiple data sources.

undef · July 25, 2024, 1:32pm

You skipped creating the connection in your second attempt.
:face_palm:

my use case isn’t not really about additional types of files per se; they’re (particularly malformed) csv’s. I already have sql available to deal with that, relying on read_text - i.e. not using the duckdb builtin parser.
Also, the existing sql relies on the path (/lookups, /data… ) and then globs the lot into the correct view - i.e. combines the main data set with user data added.

I,m trying/hoping for a straightforward path where I could just upload the usrfile in their correct locations in the wasm filesystem and keep the sql unchanged.

thanks for the help, and love to hear other ideas about the approach.

mootari · July 25, 2024, 10:32pm

To me that sounds like the sort of data wrangling that you’d normally do in a dataloader, so that it wouldn’t need to run in the user’s browser.

Can you say more about what you want to accomplish in general, e.g. why you picked Framework for this task?

undef · July 26, 2024, 8:45am

it’s complicated… and there have been various experiments over time. Including a data wrangling pipeline or two. duckdb (ideally duckdb-wasm) stuck. Other experiments involved perspective, mosaic, vega/lite, evidence …

the core of the use case is a large central data set (the static part) combined with a small user generated data set (the unpredictably messy part) visualized as a static dashboard. so it’s the old 80/20 rule … i guess.

reasons for trying Framework

duckdb
local first (duckdb-wasm)
static (nothing to manage besides a dumb webserver)
batteries included (just works/looks fine out of the box)
libraries

mootari · July 26, 2024, 10:56am

That still sounds to me like you’d want to do the messy wrangling part in server-side code via a data loader and then have it output a cleaned up file (e.g. parquet, arrow, JSON …, depending on how you want to consume it in your dashboard)? In which case DuckDBClient or FileAttachments wouldn’t come into play yet.

https://observablehq.com/framework/loaders

mbostock · July 26, 2024, 12:36pm

It sounds to me like you’re trying to create a DuckDBClient with some tables coming from a FileAttachment (either static files or created by a data loader) and other tables coming from the user. For the latter you can use Inputs.file to let the user select a file from their local file system.

Observable Framework’s SQL front matter doesn’t support user-specified files (since you need to render a file input somewhere), but you can do the same thing by using the underlying DuckDBClient.of method in JavaScript. For example, here the widgets table comes from the widgets.csv file, while the others table comes from a user-specified file:

```js
const db = await DuckDBClient.of({
  widgets: FileAttachment("widgets.csv"),
  ...otherFile && {others: otherFile.csv()}
});
```

The user input is declared as:

```js
const otherFile = view(Inputs.file({label: "File", accept: ".csv"}));
```

If you want to use this database for your SQL cells, you can override the sql built-in like so:

const sql = db.sql.bind(db);

I’m using JavaScript to parse the file here (file.csv, which is implemented by d3-dsv), but you could load it as text and use a different technique if you prefer. You can also create tables using SQL commands.

const url = await file.url();
const buffer = await file.arrayBuffer();
await db._db.registerFileBuffer("other.csv", new Uint8Array(buffer));

I’m not sure if you can use read_text, though. I get Table Function with name read_text does not exist! if I try this:

SELECT * FROM read_text('other.csv');

undef · July 26, 2024, 1:19pm

thanks for the examples, that’ll help. including a file in the front matter like that … is indistinguishable from magic ( to me, guess I’m not sufficiently advanced

read_text (and read_blob) are additions in more recent versions (0.10.0) of duckdb, combined with regexp_split_to_table on \n it’s pretty much the same as using read_csv with \b \v as seperator (but that functionality is now removed from duckdb)

Topic		Replies	Views
Loading a duckdb database Community Help	7	535	March 23, 2024
Can I create a table from parquet in duckdb with "DuckDBClient.of"? Community Help	8	661	November 29, 2022
Inputs.file & DuckDB Community Help	4	499	July 16, 2023
Populating a DuckDb database cell with remote binary file (parquet) Community Help	2	282	January 10, 2023
DuckDB: load multiple remote parquet files dinamically Community Help	7	1271	June 5, 2024

how to use duckdb *and* DuckDBClient

Related topics

how to use duckdb and DuckDBClient