Creating a parquet data loader with R

Hi I’m trying to create a data loader using R to create and load a parquet file, but running into some errors that I’m not sure how to fix.

My R code is something like:

library(dplyr)
library(arrow)

cat(
  dplyr::starwars %>% 
    select(mass, height) %>%
    write_dataset(
      path = ".",
      format = "parquet",
      basename_template = "starwars_{i}.parquet"
    )
)

And I’m trying to access the output with:

const db = DuckDBClient.of({
  starwars: FileAttachment("starwars_0.parquet"),
});

display(db.query("SELECT * FROM starwars"));

I get the error:

RuntimeError: Invalid Error: Opening file 'starwars_0.parquet' failed with error: Failed to open file: starwars_0.parquet.

I’m guessing the problem is with how I’m trying to create the parquet file, any suggestions on how to fix this would be most appreciated!

Hi @chaz. Please ask for help with our open-source tools on GitHub rather than here. You can browse existing discussions and open a discussion for Observable Framework here:

For this question in particular, please note that data loaders need to write (a single file) to standard out (stdout); they shouldn’t write to a file path directly. You can find an example of an R data loader here:

I don’t know R well enough to say how to do that with write_dataset, but at least on macOS and Linux systems, you might be able to write to /dev/stdout. Good luck!

1 Like

No worries, thank you!