Normalised histogram in Observable Plot

I am drawing a histogram like this

Plot.plot({
  marks: [
    Plot.rectY(data, Plot.binX({y: "count"},{x: "x"}))
  ]
})

and it works well. However, I’d like to rescale the histogram to represent a probability distribution (the total area of all bars should be 1).

As far as I can tell Plot.binX does not support this. So I tried Plot.transformX but it’s not clear how to do it because I need to normalise by the bin-width which I do not know how to access.

Any hints?

1 Like

try changing y to "proportion"?

I tried but it does not work, what I get is this:

The total area looks to be something like 0.15. According to the documentation

proportion - the sum proportional to the overall total

it sounds as if proportion is summing the values in each bin and dividing by the total sum. What I am looking for is the analogue of density=True in matplotlib-hist:

If True, draw and return a probability density: each bin will display the bin’s raw count divided by the total number of counts and the bin width (density = counts /(sum(counts) * np.diff(bins))), so that the area under the histogram integrates to 1 (np.sum(density * np.diff(bins)) == 1).

The following reducer should accomplish what you want:

Plot.binX({
  y: (a, bin) => {
    return a.length / pts.length / (bin.x2 - bin.x1);
  }
}

Here it is in action:

Note that the data consists of 5000 points generated by d3.randomNormal() and the curve is a graph of the standard normal, illustrating that the technique works.

Thanks, yes this is exactly what I am trying to achieve.

Is there a more automatic way than dividing by the number of points? I want to do this in more complicated situations, with facets or different color groups, so for each group the normalising factor will be different.

I forked your notebook here https://observablehq.com/d/0ef7cf5eae234601 for a simple example.

Not that I see; the options to bin are documented here:

@fil would be the expert on the latest and greatest, though.

For facets we have proportion-facet, but I realize this is not what you’re looking for exactly, since you really want to normalize by series (facet + z).

One way to do the normalization by facet is to divide all the heights by the width of the x interval, using a map transform, as I do in the first chart of https://observablehq.com/@fil/plot-normalized-histograms

Normalizing each series (including not only fy, but also z) means you should not stack the series. It seems to me that this is in contradiction with the use of bars, which are inherently stacking/summable (unless you make them semi-transparent…). However, it seems like a reasonable ask if you want to represent the probability densities with a line mark, as I do in the second chart.

Or a semi-transparent area mark, for aesthetics:

We would have to see how to implement a native normalization to area=1, if it’s possible; it happens somewhere around group.js#L422.

Suggestions are welcome! I can see in the source code that we hesitate to remove proportion-facet, maybe there could be a more general mechanism to cover all these cases.

Tracking issue: Density reducer · Issue #1940 · observablehq/plot · GitHub

1 Like

Thanks a lot! The mapY approach works well for my case!

1 Like