limiting facets and displaying aggregate values

noslouch · June 9, 2022, 1:58pm

Hello!

I’m exploring the change of address (COA) dataset from the US Postal Service, and I’d like to create small multiples for each state and show aggregates of a couple values.

The data looks something like this:

[{
  YYYYMM: 202001
  ZIPCODE: "00601"
  CITY: "ADJUNTAS"
  STATE: "PR"
  TOTAL FROM ZIP: 13
  TOTAL BUSINESS: 0
  TOTAL FAMILY: 0
  TOTAL INDIVIDUAL: 0
  TOTAL PERM: 0
  TOTAL TEMP: 0
  TOTAL TO ZIP: 0
}, {
  YYYYMM: 202001
  ZIPCODE: "00602"
  CITY: "AGUADA"
  STATE: "PR"
  TOTAL FROM ZIP: 25
  TOTAL BUSINESS: 0
  TOTAL FAMILY: 0
  TOTAL INDIVIDUAL: 27
  TOTAL PERM: 34
  TOTAL TEMP: 0
  TOTAL TO ZIP: 40
}, {
  ...
}]

Where each object represents a zip code. You can get a copy of the data yourself here.

I’m trying to facet by STATE, and in each facet, I’d like to show a bar for the sum of TOTAL FROM ZIP and TOTAL TO ZIP, and perhaps even a drive NET value that equals the difference between the aggregate TO and FROM values.

I’ve tried using the facet option with a groupX mark, but the values I’m interested in are not discrete, so I wonder if that’s the problem.

Is there a way to use the Plot API to transform the data into whatever format facet is looking for?

Should I just do the aggregation first and feed the calculated values to Plot, so they look like this on the way in?

[{
  "STATE": "PR",
  "TOTAL TO ZIP": 10000,
  "TOTAL FROM ZIP": 5000
}, {
  "STATE": "CA",
  "TOTAL TO ZIP": 5000000,
  "TOTAL FROM ZIP": 40000,
}, {
  ...
}]

Thanks!

Fil · June 9, 2022, 3:54pm

Here’s what you could do:

Plot.plot({
  facet: { data, y: "STATE" },
  marks: [
    Plot.barX(
      data,
      Plot.groupZ(
        { x: "sum" },
        { x: "TOTAL FROM ZIP", fill: "green", insetBottom: 9 }
      )
    ),
    Plot.barX(
      data,
      Plot.groupZ({ x: "sum" }, { x: "TOTAL TO ZIP", fill: "red", insetTop: 9 })
    )
  ]
})

As you can see, I’ve “translated” your description of the chart almost word for word into the Plot API. The only difficulty here is groupZ, which groups all the values from a facet and creates a summary aggregate (in this case, the sum).

For the difference chart, I could do the same, but since we’re not using the y axis, I’m going to suggest an approach where we’d use y for the state, instead of faceting. The code becomes:

Plot.plot({
  marks: [
    Plot.barX(
      data,
      Plot.groupY(
        { x: "sum" },
        {
          x: (d) => Math.max(d["TOTAL FROM ZIP"] - d["TOTAL TO ZIP"], 0),
          fill: "steelblue",
          y: "STATE"
        }
      )
    ),
    Plot.barX(
      data,
      Plot.groupY(
        { x: "sum" },
        {
          x: (d) => Math.min(d["TOTAL FROM ZIP"] - d["TOTAL TO ZIP"], 0),
          fill: "orange",
          y: "STATE"
        }
      )
    )
  ],
  caption: "difference"
})

and I’m using groupY (group by Y) instead of groupZ—which might be a bit clearer.

You’re welcome to fork this notebook as a starting point.

noslouch · June 9, 2022, 5:05pm

@Fil Thanks! I appreciate your insight for the net difference chart; it’s a much easier way to understand what’s happening.

Is there a way to sort the y-axis on the net gain value?

And one note: I think the order is swapped in the arithmetic. It looks like it should be TO - FROM, as shown below:

const TO = 20;
const FROM = 8;
// net gain of 12
Math.max(TO - FROM, 0); // 12
Math.min(TO - FROM, 0); // 0

where as if it’s FROM - TO:

const TO = 20;
const FROM = 8;
// net gain of 12, registered as net loss of 12
Math.max(FROM - TO, 0); // 0
Math.min(FROM - TO, 0); // -12

Fil · June 9, 2022, 5:31pm

Sorting can be done in the first chart by adding a mark with a sort option:

Plot.tickX(
      data,
      Plot.groupZ(
        { x: "sum" },
        {
          x: (d) => d["TOTAL FROM ZIP"] - d["TOTAL TO ZIP"],
          sort: { fy: "x", reduce: "sum" }
        }
      )
    )

however, this mark’s values can go into the negative, and it’s going to make this first chart a bit unbalanced. So I suggest we first treat the order in the second chart, by adding a mark that shows the balance and sorts the y axis:

Plot.tickX(
      data,
      Plot.groupY(
        { x: "sum" },
        {
          x: (d) => d["TOTAL FROM ZIP"] - d["TOTAL TO ZIP"],
          inset: 2,
          y: "STATE",
          sort: { y: "x", reduce: "sum" }
        }
      )
    )

now, this defines the order of the domain of the y scale in the second chart, which we can reuse as the domain of the fy axis in the first chart!

fy: { domain: second.scale("y").domain }

I’m leaving the sign of the gain vs losses as an exercise for the reader

noslouch · June 9, 2022, 9:08pm

ah! yes I finally found my way to the text mark.

Plot.textX(
  y2021,
  Plot.groupY(
    { x: "sum" },
    {
      text: (d) =>
        d3.sum(d, (d) => d["TOTAL TO ZIP"]) -
        d3.sum(d, (d) => d["TOTAL FROM ZIP"]),
      y: "STATE",
      x: (d) => d["TOTAL TO ZIP"] - d["TOTAL FROM ZIP"],
    }
  )
)

this has been very helpful, thank you. But now you’re in trouble because I’ll just keep coming back to you when I need an assist!

noslouch · June 9, 2022, 9:45pm

one last question @fil… what if I wanted to set the text fill based on the net change? I’m not sure how to approach that.

Fil · June 10, 2022, 6:36am

maybe try something like

Topic		Replies	Views
Sorting facets in Plot Community Help	4	1596	September 17, 2021
Adding Plot text mark for n value (sum/count) of each facet Community Help	5	675	October 11, 2021
grouped bar chart - reorder axis New To Observable	2	190	May 21, 2024
Observable Plot: Stack and Facet Community Help	1	572	August 26, 2022
Plot: Facets - filter facet frames and values Community Help	9	965	May 13, 2022

limiting facets and displaying aggregate values

Related topics