Run local Python or R script?

I’ve got my notebook where I want it (private at this point :frowning:) and I’m looking at way of interacting with the (medium-ish sized) source data. My current way of attaching a .csv can only get ~33% of data in one file(50Mb limit hit). Using a SQLite database file attachment doesn’t help the file size too much. I also didn’t notice much speed difference from the database and I had type problems with it.

I cannot host the data on the internet. I am wondering if I could dynamically create the source data on inputs, but that would require running a local python (or R script) and awaiting the response to that and that doesn’t seem possible. Is that correct?

GPT4 doesnt help much either:

const { spawn } = require('child_process');
const rScript = spawn('Rscript', ['/path/to/script.R', 'arg1', 'arg2']);

rScript.stdout.on('data', data => {
  console.log(`stdout: ${data}`);
});

rScript.stderr.on('data', data => {
  console.error(`stderr: ${data}`);
});

rScript.on('close', code => {
  console.log(`child process exited with code ${code}`);

@kickout

Great question! I can offer two solutions:

  1. Create a parquet file from your csv. We have seen some incredible size reductions (80MB → 5MB). Here is a notebook that can walk you through that. You can then simply attach the parquet file and access it using a Data Table cell.

  2. You can dynamically load the csv file using a File input:
    viewof file = Inputs.file({label: "Data"})
    which would allow you to at runtime attach the file which would not be a file attachment, but rather just load into your browser memory.

Interesting, what if I need to change the types?

Excellent reduction though!

In the DuckDB page you mention the easiest way to get the data into a notebook is a DataTable cell. Is there any way to use an ‘await’ statement to let that cell finish. As is seems to asynchronously populate trigger many cells to recalculate several times before the full data is ‘loaded’

The array has a done property. If you want to make dependent cells wait until the data has been fully loaded, you can use an intermediary cell:

allData = data.done ? data : invalidation
1 Like