plot average of many lines

I’m using Observable Plot to plot many lines and I would like to also plot the average of these lines.

I managed to plot the lines themselves, but I’m struggling to plot the average.

I’ve been struggling with the documentation for groupX and binX. I find the terminology and the API to have a steep learning curve! :slight_smile:

The parquet file i’m using looks like this

            the_value                        session_id
0.000000     0.009626  5b85b678d44228745db4bb13c8fbf7e2
0.002865     0.009681  5b85b678d44228745db4bb13c8fbf7e2
0.005731     0.010795  5b85b678d44228745db4bb13c8fbf7e2
0.008596     0.011571  5b85b678d44228745db4bb13c8fbf7e2
0.011461     0.011603  5b85b678d44228745db4bb13c8fbf7e2
...               ...                               ...
0.994005     0.482123  7933278e2b4236a0d52085230a80f37a
0.995204     0.496144  7933278e2b4236a0d52085230a80f37a
0.996403     0.505115  7933278e2b4236a0d52085230a80f37a
0.997602     0.534115  7933278e2b4236a0d52085230a80f37a
0.998801     0.538065  7933278e2b4236a0d52085230a80f37a

[31794 rows x 2 columns]

I have 96 unique session_ids.

My code looks like this.

  grid: true,
  inset: 10,
  x: { tickFormat: d3.format(".2f"), domain: [0, 0.4] },
  marks: [
    Plot.lineY(ecdf_data, {x: "the_value", y: "proportion", stroke: "session_id"}),
      Plot.binX({y: "mean"}, {x: "the_value", y: "proportion", stroke: "black", strokeWidth: 3})

This results in the following plot where there is only a small number of bins and the average value seems completely off.


I tried many things, changing the thresholds of the binning.

Initially, what I want is to only group by the_value and plot the average proportion for each small bin of the_value.

This looks correct at first glance, but if you can share the data (or notebook) it will be easier to investigate.

I’d try calculating a few by hand (ok, not by hand, but with “raw” javascript), plot them as dots on the same graph, and see if they line up with the dark line.

I’m not sure I agree that the average seems off - it’s a very left-tailed distribution (as in, there are few low values but they pull the average way down). But it’s hard to just guess at that because the plot is so crowded.

Thanks, so actually the issue lied in the data and explained why the calculation was off according to what I wanted.

After resampling the distributions and padding with 0s (bfill) and 1s (ffill) i managed to get what I wanted!

Though I’m doing all of this on the Python side rather than Observable.

Here’s what I wanted

1 Like