I’m using Observable Plot to plot many lines and I would like to also plot the average of these lines.
I managed to plot the lines themselves, but I’m struggling to plot the average.
I’ve been struggling with the documentation for groupX
and binX
. I find the terminology and the API to have a steep learning curve!
The parquet file i’m using looks like this
the_value session_id
proportion
0.000000 0.009626 5b85b678d44228745db4bb13c8fbf7e2
0.002865 0.009681 5b85b678d44228745db4bb13c8fbf7e2
0.005731 0.010795 5b85b678d44228745db4bb13c8fbf7e2
0.008596 0.011571 5b85b678d44228745db4bb13c8fbf7e2
0.011461 0.011603 5b85b678d44228745db4bb13c8fbf7e2
... ... ...
0.994005 0.482123 7933278e2b4236a0d52085230a80f37a
0.995204 0.496144 7933278e2b4236a0d52085230a80f37a
0.996403 0.505115 7933278e2b4236a0d52085230a80f37a
0.997602 0.534115 7933278e2b4236a0d52085230a80f37a
0.998801 0.538065 7933278e2b4236a0d52085230a80f37a
[31794 rows x 2 columns]
I have 96 unique session_id
s.
My code looks like this.
Plot.plot({
grid: true,
inset: 10,
x: { tickFormat: d3.format(".2f"), domain: [0, 0.4] },
marks: [
Plot.lineY(ecdf_data, {x: "the_value", y: "proportion", stroke: "session_id"}),
Plot.lineY(
ecdf_data,
Plot.binX({y: "mean"}, {x: "the_value", y: "proportion", stroke: "black", strokeWidth: 3})
),
]
})
This results in the following plot where there is only a small number of bins and the average value seems completely off.
I tried many things, changing the thresholds
of the binning.
Initially, what I want is to only group by the_value
and plot the average proportion
for each small bin of the_value
.