limiting facets and displaying aggregate values

Hello!

I’m exploring the change of address (COA) dataset from the US Postal Service, and I’d like to create small multiples for each state and show aggregates of a couple values.

The data looks something like this:

[{
  YYYYMM: 202001
  ZIPCODE: "00601"
  CITY: "ADJUNTAS"
  STATE: "PR"
  TOTAL FROM ZIP: 13
  TOTAL BUSINESS: 0
  TOTAL FAMILY: 0
  TOTAL INDIVIDUAL: 0
  TOTAL PERM: 0
  TOTAL TEMP: 0
  TOTAL TO ZIP: 0
}, {
  YYYYMM: 202001
  ZIPCODE: "00602"
  CITY: "AGUADA"
  STATE: "PR"
  TOTAL FROM ZIP: 25
  TOTAL BUSINESS: 0
  TOTAL FAMILY: 0
  TOTAL INDIVIDUAL: 27
  TOTAL PERM: 34
  TOTAL TEMP: 0
  TOTAL TO ZIP: 40
}, {
  ...
}]

Where each object represents a zip code. You can get a copy of the data yourself here.

I’m trying to facet by STATE, and in each facet, I’d like to show a bar for the sum of TOTAL FROM ZIP and TOTAL TO ZIP, and perhaps even a drive NET value that equals the difference between the aggregate TO and FROM values.

I’ve tried using the facet option with a groupX mark, but the values I’m interested in are not discrete, so I wonder if that’s the problem.

Is there a way to use the Plot API to transform the data into whatever format facet is looking for?

Should I just do the aggregation first and feed the calculated values to Plot, so they look like this on the way in?

[{
  "STATE": "PR",
  "TOTAL TO ZIP": 10000,
  "TOTAL FROM ZIP": 5000
}, {
  "STATE": "CA",
  "TOTAL TO ZIP": 5000000,
  "TOTAL FROM ZIP": 40000,
}, {
  ...
}]

Thanks!

Here’s what you could do:

Plot.plot({
  facet: { data, y: "STATE" },
  marks: [
    Plot.barX(
      data,
      Plot.groupZ(
        { x: "sum" },
        { x: "TOTAL FROM ZIP", fill: "green", insetBottom: 9 }
      )
    ),
    Plot.barX(
      data,
      Plot.groupZ({ x: "sum" }, { x: "TOTAL TO ZIP", fill: "red", insetTop: 9 })
    )
  ]
})

As you can see, I’ve “translated” your description of the chart almost word for word into the Plot API. The only difficulty here is groupZ, which groups all the values from a facet and creates a summary aggregate (in this case, the sum).

For the difference chart, I could do the same, but since we’re not using the y axis, I’m going to suggest an approach where we’d use y for the state, instead of faceting. The code becomes:

Plot.plot({
  marks: [
    Plot.barX(
      data,
      Plot.groupY(
        { x: "sum" },
        {
          x: (d) => Math.max(d["TOTAL FROM ZIP"] - d["TOTAL TO ZIP"], 0),
          fill: "steelblue",
          y: "STATE"
        }
      )
    ),
    Plot.barX(
      data,
      Plot.groupY(
        { x: "sum" },
        {
          x: (d) => Math.min(d["TOTAL FROM ZIP"] - d["TOTAL TO ZIP"], 0),
          fill: "orange",
          y: "STATE"
        }
      )
    )
  ],
  caption: "difference"
})

and I’m using groupY (group by Y) instead of groupZ—which might be a bit clearer.

You’re welcome to fork this notebook as a starting point.

@Fil Thanks! I appreciate your insight for the net difference chart; it’s a much easier way to understand what’s happening.

Is there a way to sort the y-axis on the net gain value?

And one note: I think the order is swapped in the arithmetic. It looks like it should be TO - FROM, as shown below:

const TO = 20;
const FROM = 8;
// net gain of 12
Math.max(TO - FROM, 0); // 12
Math.min(TO - FROM, 0); // 0

where as if it’s FROM - TO:

const TO = 20;
const FROM = 8;
// net gain of 12, registered as net loss of 12
Math.max(FROM - TO, 0); // 0
Math.min(FROM - TO, 0); // -12

Sorting can be done in the first chart by adding a mark with a sort option:

Plot.tickX(
      data,
      Plot.groupZ(
        { x: "sum" },
        {
          x: (d) => d["TOTAL FROM ZIP"] - d["TOTAL TO ZIP"],
          sort: { fy: "x", reduce: "sum" }
        }
      )
    )

however, this mark’s values can go into the negative, and it’s going to make this first chart a bit unbalanced. So I suggest we first treat the order in the second chart, by adding a mark that shows the balance and sorts the y axis:

Plot.tickX(
      data,
      Plot.groupY(
        { x: "sum" },
        {
          x: (d) => d["TOTAL FROM ZIP"] - d["TOTAL TO ZIP"],
          inset: 2,
          y: "STATE",
          sort: { y: "x", reduce: "sum" }
        }
      )
    )

now, this defines the order of the domain of the y scale in the second chart, which we can reuse as the domain of the fy axis in the first chart!

fy: { domain: second.scale("y").domain }

I’m leaving the sign of the gain vs losses as an exercise for the reader :slight_smile:

1 Like

ah! yes I finally found my way to the text mark.

Plot.textX(
  y2021,
  Plot.groupY(
    { x: "sum" },
    {
      text: (d) =>
        d3.sum(d, (d) => d["TOTAL TO ZIP"]) -
        d3.sum(d, (d) => d["TOTAL FROM ZIP"]),
      y: "STATE",
      x: (d) => d["TOTAL TO ZIP"] - d["TOTAL FROM ZIP"],
    }
  )
)

this has been very helpful, thank you. But now you’re in trouble because I’ll just keep coming back to you when I need an assist!

one last question @fil… what if I wanted to set the text fill based on the net change? I’m not sure how to approach that.

maybe try something like