Data table cell arrow data support

I use @uwdata/vgplot duckdb-wasm to query data from http,
if vgplot query returns array will accept by data table cell, but if query returns arrow, data table cell report this error:

Are you able to share your notebook (or a simplified example)?

this is an example: Untitled / sn | Observable

Can you share your repro steps? I only manage to trigger the error every once in a while if I randomly select and deselect columns.

sorry, I had set the default data source to vg_data_array_is_ok, did you change the data cell source to vg_data_arrow, I meet this error all the time

Looks like the file parquet file isn’t served correctly. I can see CORS errors in the network tab.

sorry, the CORS was revoked by IaC, please try again, or you can pass a file url parameter to override the default parquet file witch fit your network like this https://observablehq.com/@cnsn/data-preview2?url=https//www.to/file.parquet

Now I’m also getting the error consistently. I suspect that some race conditions and/or side effects might be at play. Perhaps you can try to model the dependencies more explicitely?

It looks like vg_data_arrow does not register as table:

(await DuckDBClient.of({vg_data_arrow, penguins})).sql`show tables`

After changing the type to json and playing around some more with other Arrow examples I can see now that this is likely a bug on our end.

I’ve filed an internal issue, but I’m afraid I don’t know how soon we’ll be able to fix it. :pray:

I suspect this is caused by

I find that Input.table can accept vg_data_arrow correctly:

Data table cells use DuckDB under the hood, so they inherit some of the bugs. Inputs.table() simply requires an Iterable of objects and does not look at the schema at all.

As a workaround you can use

vg_data_array = vg.coordinator().query(sql, {type: "json", /* ... */})

(By the way, you don’t need await at the top level - Observable’s Runtime automatically awaits Promises returned by cells.)

As mentioned, the underlying problem is that three different module instances (and versions) of Apache Arrow are being loaded. Mosaic uses a different arrow module instance to produce the Arrow Table than Observable’s Stdlib, which then fails DuckDB’s instance checks. Observable’s DuckDBClient.of correctly detects Arrow Tables regardless of version because it uses duck typing (no pun intended), but DuckDB itself will then fail to detect the Table and skips it.

To give an example, try the following:

// Import the same arrow module URL as used by Observable's DuckDB - modules are singletons
arrow11 = import("https://cdn.observableusercontent.com/npm/apache-arrow@11.0.0/+esm")
// Convert to an arrow 11 Table
vg_data_arrow = arrow11.tableFromJSON(
  (await vg.coordinator().query(sql, {type: 'arrow', cache: false})).toArray()
)

Your data tables will now let you select “vg_data_arrow” as table and correctly show the data.

I also tried to mask the instance by proxying getPrototypeOf, but unfortunately it seems that the format has changed too much to be backwards compatible to 11.

I was misremembering. It’s actually Arrow that performs the instanceof checks:

Thank a lot for your time. The performance is not the first level for me at this time . I can use the array instead of arrow. I’m using vg.coordinator().query() just for get data from http on AWS s3. I had find the solution you had answer in the other channel, thanks again! :smile: