I’m really excited by the introduction of DuckDB in the Observable Standard Library! It makes working with a wide range of data really consistent (and tbh I need to sharpen up my SQL skills ).
One reason I particularly like DuckDB is all of the recent talk about Parquet files. I already publish a bit of data in this format, and the idea of pulling data straight down from a remote Parquet file is really appealing.
That said, it would be amazing if I wasn’t downloading the whole Parquet file when I did it.
Going by my own tests, it doesn’t look like ranged requests are happening: the entire 17 MB Parquet file here is downloaded. I find the same thing using the Observable Standard Library in Quarto, even if I point it to a URL that responds with accept-range: "bytes".
Is this something that does currently work with some adjustment, or is it something that the team is thinking of adding?
You can already do range requests for file attachments, but DuckDB’s detection method seems flawed. What DuckDB does is to issue a HEAD request with a Range header and then check the response status for 206 Partial Content:
This in itself itself seems strange, as the HTTP spec apparently says to ignore the Range header for anything but GET.
A response for an actual range request looks like this: