Use the output of group to filter

Hello all. probably a beginner issue on Plot.

2023-04 AWS EC2 pricing investigation / Frank Contrepois | Observable on the second graph under the title " Generations with a price not directly related to the number of vCPUs"

The graph and text are correct (yet I am still confused about how things in group work together) but I would like the graph to only show elements where the variance is bigger than one, but cannot find a way.

à bientôt
Frank

Plot.plot({
  width: 1600,
  marginLeft: 50,
  marginBottom: 50,
  x:{ tickRotate: -90},
  y:{grid: true, nice: true },
  marks: [
    Plot.barY(dataus1, 
      Plot.groupX(
        {
          y: "variance", 
          filter: d=>d.length>1
        },
        {
          x: "generation", 
          sort: {x: "y", reverse: true},
          fill: "generation",
          y:"pricePervCPU",
      })),
    Plot.textY(dataus1, 
      Plot.groupX(
        {
          text: "variance",
          y: "variance", 
          filter: d=>d.length>1
        },
        {
          x: "generation", 
          y:"pricePervCPU",
          text: "pricePervCPU",
          sort: {x: "y", reverse: true},
          dy: -10,
      })),
  ]
})

Not an easy one! Here’s a filter that only retains the groups that have a variance > 1:

Plot.barY(
  dataus1,
  Plot.groupX(
    {
      y: "variance",
      filter: (groupedData) => d3.variance(groupedData, (d) => d.pricePervCPU) > 1
    },
    {
      x: "generation",
      sort: { x: "y", reverse: true },
      fill: "generation",
      y: "pricePervCPU"
    }
  )
),

(It is a bit unfortunate that we can’t easily reuse the value computed in y.)

Note that you shouldn’t specify the sort: {x: “y”, reverse: true}, twice—only one of them will apply anyway, so it creates some uncertainty.

1 Like

I agree, that would be great.

Thank you very much

I was too fast in replying.
Can you explain where is groupedData coming from?

groupedData is the grouped data :wink: It’s the result of grouping on x. In other words, all the data points that share the same value of x are put together in an array, which is then passed as an argument to the reducer defined in the outputs.

In general, if you had a corresponding input (like for “y”), it would not receive the whole data for each group, but just the values of y. The function is called for each group.

Hello amazing responding Plot people.

I asked about groupedData because I cannot find a reference to it anywhere in the documentation.

When I use transformation in Plot I still find the relationship between output and option alike black magic. My current approach is guess, try, does it work?, repeat.

→ In this example (see attached image), how does the relationship between the two text fields?

I tried to read the doc in github but still scratch my head every time. I am sure to be missing something obvious, but cannot find it on my own.

image

One important thing to note is that groupedData isn’t a name defined by Plot, it’s simply the parameter name that Fil chose for his example. It would be equally valid (though less readable) to write the filter as

filter: (x) => d3.variance(x, (d) => d.pricePervCPU) > 1

As for the relationship between those two fields, the groupX function defines that relation. The way it is specified is that it goes through all the fields in the second object (the input object), except x (because of groupX). For each of those fields it creates groups of values that share the same value for x.

Then for each of those groups, it consults the output object (the first one) to figure out what to do with the group. In this case, the text field of the output says to compute the variance. So you might say that the text portion of this example could be alternately expressed with these instructions:

  • Group all rows in dataus1 by the x field, which is calculated from the generation column of the row.
  • For each group, calculate the variance of the pricePervCPU column.
  • Assign that caculated variance to the “text” output channel in the plot.

The fields in the input and output are always matched by the channel name. The input’s text will also be related to the output’s text, y to y, etc. The only exception is x, because that’s the field that is grouped by. You could also use groupY, which would group by the y field.

1 Like

It seems that this very good explanation could be in the documentation for transformations.

Thanks
Frank