I’m really excited by the introduction of DuckDB in the Observable Standard Library! It makes working with a wide range of data really consistent (and tbh I need to sharpen up my SQL skills ).
One reason I particularly like DuckDB is all of the recent talk about Parquet files. I already publish a bit of data in this format, and the idea of pulling data straight down from a remote Parquet file is really appealing.
That said, it would be amazing if I wasn’t downloading the whole Parquet file when I did it.
Going by my own tests, it doesn’t look like ranged requests are happening: the entire 17 MB Parquet file here is downloaded. I find the same thing using the Observable Standard Library in Quarto, even if I point it to a URL that responds with accept-range: "bytes".
Is this something that does currently work with some adjustment, or is it something that the team is thinking of adding?
Looks like the file attachment isn’t providing the right headers for range requests. Perhaps access-control-allow-headers: range? Is that something the team is able to change in time?
(For my own use in Quarto, I can control where I download data from, and I’m guessing Observable has attachment size limits that make range requests less useful. It’d still be nice, though!)
You can already do range requests for file attachments, but DuckDB’s detection method seems flawed. What DuckDB does is to issue a HEAD request with a Range header and then check the response status for 206 Partial Content:
This in itself itself seems strange, as the HTTP spec apparently says to ignore the Range header for anything but GET.
A response for an actual range request looks like this: