strange data loader error (Framework)

I wanted to use a data loader to do some preprocessing on a parquet file, so I wrote a bash script like:

cd -- $(dirname "$0") 
duckdb <<'END_OF_SQL' 
-- some sql here, producing a table tbl
copy tbl to '/dev/stdout' (format parquet, codec zstd);
END_OF_SQL

Unfortunately this does not work. That is, it works just fine when launched from the command line, but not as a data loader. The error as seen in Observable is

IO Error: Cannot open file "/dev/stdout": No such device or address

I can work around this by modifying the copy tbl line as

copy tbl to 'data/tbl.parquet' (format parquet, codec zstd);

and adding

cat data/tbl.parquet

as the last line in the file.

A little puzzling, isn’t it?

I don’t see that error, but I do see a different error FATAL Error: fsync failed! which is this DuckDB issue here (from another Observable Framework user):

I believe the bug is that DuckDB is trying to fsync on stdout, which isn’t supported.

If you don’t mind playing a little fast and loose by ignoring all DuckDB errors, you could try this instead to avoid the temp file:

duckdb :memory: <<EOF || true
copy (select * from range(100) tbl(i)) to '/dev/stdout' (format parquet, codec zstd)
EOF

We’re spawning the process here:

Perhaps there’s a different way to spawn a process that avoids this error? :thinking:

I am running into this problem as well.

I have a very minimal dataloader, data/test.csv.sh with the contents

#!/usr/bin/env sh
duckdb -c "COPY (SELECT 1) TO STDOUT WITH (FORMAT CSV);"

running it directly with ./src/data/test.csv.sh works as expected, but when Framework runs it I get the error Cannot open file "/dev/stdout".

I forgot to mention, I’m using Linux, while you’re on MacOS (my guess), so that would explain the different errors.

I should also stress that the script where duckdb writes directly to /dev/stdout runs correctly from the command line and reports no errors. I of course checked by redirecting stdout to a file that it produces a valid parquet file with the expected content.

I’ve figured out how to fix this in Framework with a very minimal change to how we direct the output of the child process to a file:

Assuming this lands, we should get a fix released in the next day or two.

1 Like