A data loader that produces a parquet file

saptarshiguha · March 21, 2024, 9:05am

Hello,
Firstly, Observable Framework is really nice, so many things included from the start and aesthetically lovely. thanks

I have a question about data loaders. I want to use DuckDBClient on a parquet file e.g.

DuckDBClient.of({afm: FileAttachment("./data/poem_data.parquet")}

but the parquet file is created by poem_data.parquet.py.

The only way I can get OF to read this is to run the poem_data.parquet.py file beforehand producing the parquet file on disk which is then read successfully by the above FileAttachment.

How would I get OF to trigger the file creation?

thanks again!
Saptarshi

Fil · March 21, 2024, 10:58am

Hi, poem_data.parquet.py must not create a file on disk, but instead must send the parquet payload to stdout. (It can create a temporary file on disk then “cat” it, but what Framework reads is the stdout.)

I’m afraid I don’t have an example with python, but see Source code | FPDN which does this is bash with duckdb.

mbostock · March 21, 2024, 2:49pm

Also, is poem_data.parquet.py in the data directory (in the same place that you expect the poem_data.parquet file to exist, but with the additional .py extension)?

Also, if you generated a poem_data.parquet file manually, note that this will take precedence over the adjacent data loader poem_data.parquet.py. You should delete or move the static file you created if you want the data loader to run on-demand.

saptarshiguha · March 21, 2024, 4:13pm

Yes, all correct. I was thinking that I would have to write the parquet file to stdout(binary) and you confirmed it.
thanks again

bjedwards · November 27, 2024, 10:26pm

I came across this thread trying to solve the same problem. Here is a snippet that worked for me. This assumes we have the data in a pandas data frame df

import sys
import tempfile

#### Some query or something that generates df as a pandas data frame
with tempfile.TemporaryFile() as f:
    df.to_parquet(f)
    f.seek(0)
    sys.stdout.buffer.write(f.read())

Topic		Replies	Views
Can I create a table from parquet in duckdb with "DuckDBClient.of"? Community Help	8	687	November 29, 2022
Creating a parquet data loader with R Cloud	2	68	September 28, 2024
Populating a DuckDb database cell with remote binary file (parquet) Community Help	2	291	January 10, 2023
strange data loader error (Framework) Community Help	4	211	May 22, 2024
DuckDB: load multiple remote parquet files dinamically Community Help	7	1360	June 5, 2024

A data loader that produces a parquet file

Related topics