Using Plot to present a histogram binned by hour?

Sorry for the rookie question, but can anyone point me to a use of Plot that shows a count of items each hour from data that is rows of timestamp, item.

Hereā€™s a quick page with sample data and an explanation: Hourly binning question / kpm-at-hfi / Observable

Typically something like this:

Plot.rectY(data, Plot.binX({y: "count"}, {x: "timestamp"})).plot()

Or, if you explicitly want to bin by hour:

Plot.rectY(data, Plot.binX({y: "count"}, {x: "timestamp", thresholds: d3.utcHour})).plot()

If you have too many bins (in your case, 668 hours span the dataset), then itā€™s possible youā€™ll end up with zero-width invisible rects. You can mitigate this by setting inset: 0, which eliminates the gap between rects.

Plot.rectY(data, Plot.binX({y: "count"}, {x: "timestamp", thresholds: d3.utcHour, inset: 0})).plot()

Live example:

2 Likes

Thanks, Mike. I did get close to that thanks to the cheatsheets, but (I probably didnā€™t ask very well) Iā€™m hoping to see all of the 8am occurrences (regardless of day) get counted up in one 8am bar.

Updated Hourly binning question / kpm-at-hfi / Observable

Ah, I see, like an intraday histogram. Youā€™re pretty close already.

The main thing is that youā€™ll have to be a little bit more explicit in telling the bin transform how to compute thresholds. If you specify the thresholds option as a number but donā€™t specify the domain option, itā€™ll compute the domain (the extent) of your data automatically, and then try to divide that into the specified number of bins. For your data, the natural domain is [5, 15]; if you ask for 24 thresholds, youā€™ll get [5.5, 6, 6.5, 7, ā€¦, 14.5]. In other words youā€™ll get half-hour bins even though your data is defined on the hour.

To fix this, you can specify the domain explicitly to [0, 24]:

Plot.plot({
  marks: [
    Plot.rectY(data, Plot.binX({y: "count"}, {
      x: d => d.timestamp.getHours(),
      domain: [0, 24],
      thresholds: 24
    })),
    Plot.ruleY([0])
  ]
})

You can alternatively specify the thresholds as [1, 2, 3, 4, ā€¦, 23]:

Plot.plot({
  marks: [
    Plot.rectY(data, Plot.binX({y: "count"}, {
      x: d => d.timestamp.getHours(),
      domain: [0, 24],
      thresholds: d3.range(1, 24)
    })),
    Plot.ruleY([0])
  ]
})

However, it looks like thereā€™s a bug with the last bin with this approach I need to investigate. (Probably the legacy of this d3-array bug.) (Itā€™s not a bugā€¦ you just have to specify the domain explicitly if you donā€™t want the last bin to have zero width. I put up a PR for an interval option to make this easier.)

Lastly another gotcha is that date.getHours will use your browserā€™s local timezone. Thatā€™s fine if all of your viewers are in the same timezone; but if theyā€™re not, youā€™ll probably want to use UTC, so do some timezone conversion yourself. :grimacing:

1 Like

This is greatā€¦ thanks! Good note about the time zone. Iā€™m only doing some internal analysis so I get to be sloppy, but youā€™re spot on if I was going to make this available more widely.

Just for learningā€™s sakeā€¦ is there any way to be able to display the bins for hours 0, 1, 2, etc that have a count of zero? It would be kind of nice/intuitive to know that Iā€™m looking at the whole dayā€™s worth of hours. I feel like Iā€™ve done this with Vega-Lite, but wanted to try it in the new shiny. :slight_smile:

Updated Hourly binning question / kpm-at-hfi / Observable.

1 Like

Yep, you can add filter: null if you donā€™t want to suppress the empty bins.

Plot.plot({
  marks: [
    Plot.rectY(data, Plot.binX({y: "count", filter: null}, {
      x: d => d.timestamp.getHours(),
      domain: [0, 24],
      thresholds: 24
    })),
    Plot.ruleY([0])
  ]
})
1 Like

Ooo niceā€¦ thanks!