🏠 back to Observable

How to left-join CSV files?

I would like to know if there is a more convenient approach than what I’m currently using to left-join two (or more) CSV files:

I’m using the instructions for combining datasets provided by Learn JS Data, but it seems weird to have to write my own join function for such a common task.

data-forge looks like a pretty good JS equivalent for R’s dplyr or Python’s pandas, but I haven’t been able to import it in Observable (I’ve tried the two options given by the module require debugger) :thinking:

Here’s how you can import data-forge:

dataForge = require('https://unpkg.com/data-forge@1.7.7/dist/web/index.js').catch(() => window.dataForge)
1 Like

Thanks, Jed! That works. Unfortunately I’m still not able to get the join with data-forge to work (I get a null dataframe). I’m not sure where to post it, so I’ve asked on StackOverflow:

const a = new DataFrame([{ A: 10, b: 1 }, { A: 20, b: 2 }, { A: 30, b: 3 }]);
const b = new DataFrame([{ A: 10, c:11 }, { A: 20, c:22 }, { A: 30, c:33 }]);
const ab = a.join(
  b,
  left => left.A,
  right => right.A,
  (left, right) => {
    return { A: left.A, b: left.b, c: right.c };
  }
);
// {content: null}
1 Like

Writing your own join function has the advantage that you know exactly what is done, and can be possibly shorter to write. Here is another take, using a more functional approach with map and filter:

cars.map( record => {
  let r = record;
  r.joined = brands.filter( b => b.id == r.brand_id );
  return r
})


This just puts the complete matching record of the second table in a new key of the first table.

The strategy is that the result should be the first table, augmented. So let’s use .map to process each row of the first table. How do we then find which record or records to join from the second table ? .filter is designed for searching, so let’s use it. .filter also automatically deals with the case when there are multiple matches. Otherwise, let’s leave everything else alone as much as possible since that can be done separately.

Could be generalized. Just an idea.

1 Like

Here’s a join implemented using d3.group and d3.cross:

{
  const left = d3.group(brands, d => d.id);
  const right = d3.group(cars, d => d.brand_id);
  return Array.from(left).flatMap(([key, lvalues]) => {
    return d3.cross(lvalues, right.get(key) || [], (lvalue, rvalue) => {
      return [key, lvalue, rvalue];
    });
  });
}
1 Like