Using Plot to present a histogram binned by hour?

Sorry for the rookie question, but can anyone point me to a use of Plot that shows a count of items each hour from data that is rows of timestamp, item.

Here’s a quick page with sample data and an explanation: Hourly binning question / kpm-at-hfi / Observable

Typically something like this:

Plot.rectY(data, Plot.binX({y: "count"}, {x: "timestamp"})).plot()

Or, if you explicitly want to bin by hour:

Plot.rectY(data, Plot.binX({y: "count"}, {x: "timestamp", thresholds: d3.utcHour})).plot()

If you have too many bins (in your case, 668 hours span the dataset), then it’s possible you’ll end up with zero-width invisible rects. You can mitigate this by setting inset: 0, which eliminates the gap between rects.

Plot.rectY(data, Plot.binX({y: "count"}, {x: "timestamp", thresholds: d3.utcHour, inset: 0})).plot()

Live example:

2 Likes

Thanks, Mike. I did get close to that thanks to the cheatsheets, but (I probably didn’t ask very well) I’m hoping to see all of the 8am occurrences (regardless of day) get counted up in one 8am bar.

Updated Hourly binning question / kpm-at-hfi / Observable

Ah, I see, like an intraday histogram. You’re pretty close already.

The main thing is that you’ll have to be a little bit more explicit in telling the bin transform how to compute thresholds. If you specify the thresholds option as a number but don’t specify the domain option, it’ll compute the domain (the extent) of your data automatically, and then try to divide that into the specified number of bins. For your data, the natural domain is [5, 15]; if you ask for 24 thresholds, you’ll get [5.5, 6, 6.5, 7, …, 14.5]. In other words you’ll get half-hour bins even though your data is defined on the hour.

To fix this, you can specify the domain explicitly to [0, 24]:

Plot.plot({
  marks: [
    Plot.rectY(data, Plot.binX({y: "count"}, {
      x: d => d.timestamp.getHours(),
      domain: [0, 24],
      thresholds: 24
    })),
    Plot.ruleY([0])
  ]
})

You can alternatively specify the thresholds as [1, 2, 3, 4, …, 23]:

Plot.plot({
  marks: [
    Plot.rectY(data, Plot.binX({y: "count"}, {
      x: d => d.timestamp.getHours(),
      domain: [0, 24],
      thresholds: d3.range(1, 24)
    })),
    Plot.ruleY([0])
  ]
})

However, it looks like there’s a bug with the last bin with this approach I need to investigate. (Probably the legacy of this d3-array bug.) (It’s not a bug… you just have to specify the domain explicitly if you don’t want the last bin to have zero width. I put up a PR for an interval option to make this easier.)

Lastly another gotcha is that date.getHours will use your browser’s local timezone. That’s fine if all of your viewers are in the same timezone; but if they’re not, you’ll probably want to use UTC, so do some timezone conversion yourself. :grimacing:

1 Like

This is great… thanks! Good note about the time zone. I’m only doing some internal analysis so I get to be sloppy, but you’re spot on if I was going to make this available more widely.

Just for learning’s sake… is there any way to be able to display the bins for hours 0, 1, 2, etc that have a count of zero? It would be kind of nice/intuitive to know that I’m looking at the whole day’s worth of hours. I feel like I’ve done this with Vega-Lite, but wanted to try it in the new shiny. :slight_smile:

Updated Hourly binning question / kpm-at-hfi / Observable.

1 Like

Yep, you can add filter: null if you don’t want to suppress the empty bins.

Plot.plot({
  marks: [
    Plot.rectY(data, Plot.binX({y: "count", filter: null}, {
      x: d => d.timestamp.getHours(),
      domain: [0, 24],
      thresholds: 24
    })),
    Plot.ruleY([0])
  ]
})
1 Like

Ooo nice… thanks!