How to create gaps when using group / groupX / groupY in Observable Plot?

On line charts “if some of the x or y channel values are undefined (or null or NaN), gaps will appear between adjacent points”.

If my dataset contains such undefined values, how can I achieve that these also remain when I use group / groupX / groupY?

The below plot is created (some settings removed for brevity) using a dataset where all x values (timestamp) are set and some y values (temp) are undefined.


marks: [
  Plot.frame(),
  Plot.line(data, { x: "timestamp", y: "temp", stroke: "sensor_id" }),
  Plot.line(data,  Plot.groupX({ y: "mean" }, { x: d => new Date(d.timestamp.getTime()).setSeconds(0, 0), y: "temp", stroke: "black" }))
]


The grouping groups all measurements per minute and plots the average in black.
In the data, there is a minute that only has a single entry, and its y value (temp) is undefined.
How can I achieve that this aggregates to undefined? Is there a setting how group behaves when a single value / all values are undefined?

I managed to make it work by providing a custom aggregation function that takes care of undefined values, but I think there should be an easier way to achieve this. It seems like I am missing something.

Plot.line(data, 
          Plot.groupX(
            // Here I check if the grouping consists of a single falsy value
            { y: data => (data.length == 1 && !data[0]) ? undefined : data.reduce((a,b) => a + b, 0) / data.length }, 
            { x: d => new Date(d.timestamp.getTime()).setSeconds(0, 0), y: "temp", stroke: "black"
))

// Comment: I am, as a new user, not allowed to add a second image here…

Hello, it will be easier to help if you could share the example notebook.

I suspect that you want to use the interval transform here, with interval: d3.utcMinute, rather than the group transform.

Hello!

I created a notebook for presentation of the problem here: Observable Plot Playground - Grouping / Markus Weninger / Observable

Yet, while creating the notebook, I found my mistake…

I think grouping only returns undefined if all of the values within a group are undefined.
I had a Date that I wanted to add a minute to (to form a separate group with a single undefined value), but accidentally did not.

This resulted in groups such as:

10:00 - [22.1, 22.1, 22.0, 22.2],            AVG = 22.1
10:01 - [22.2, 22.4, 22.4, 22.2, undefined], AVG = 22.3 // Remark: this undefined is just ignored when aggregating the group, e.g., calculating the average
10:15 - [20.2, 20.3, 20.4],                  AVG = 20,3

while I wanted to have

10:00 - [22.1, 22.1, 22.0, 22.2], AVG = 22.1
10:01 - [22.2, 22.4, 22.4, 22.2], AVG = 22.3
10:02 - [undefined],              AVG = undefined
10:15 - [20.2, 20.3, 20.4],       AVG = 20,3

FYI, Safari doesn’t support lax date parsing with the Date constructor, so this will return an Invalid Date and none of the charts display in your notebook:

new Date("01.01.2022 12:00:10")

I recommend using d3.utcParse or the standard ISO 8601 format with your data.

1 Like

If you expect to have samples about every minute, and you want to show a gap when your are missing samples for a given minute, then you can use the bin transform with time interval thresholds such as d3.utcMinute. For example:

Plot.plot({
  nice: true,
  grid: true,
  color: {
    legend: true
  },
  facet: { data, y: "device" },
  marks: [
    Plot.frame(),
    Plot.dot(data, {
      x: "timestamp",
      y: "temp",
      stroke: "sensor_id"
    }),
    Plot.line(
      data,
      Plot.binX(
        { y: "mean", filter: null },
        {
          x: "timestamp",
          thresholds: d3.utcMinute,
          y: "temp",
          stroke: "sensor_id"
        }
      )
    )
  ]
})

Note the use of filter: null, which tells the bin transform to return empty bins so that the line mark can render the gaps. Without this the line will interpolate across missing bins.

You can also use this technique to aggregate across sensors. For example, you can use an area mark to show the extent (min-max range) of the samples for each minute.

Plot.plot({
  nice: true,
  grid: true,
  color: {
    legend: true
  },
  facet: { data, y: "device" },
  marks: [
    Plot.frame(),
    Plot.areaY(
      data,
      Plot.binX(
        { y1: "min", y2: "max", filter: null },
        {
          x: "timestamp",
          thresholds: d3.utcMinute,
          y: "temp",
          fillOpacity: 0.2
        }
      )
    ),
    Plot.dot(data, {
      x: "timestamp",
      y: "temp",
      stroke: "sensor_id"
    }),
    Plot.line(
      data,
      Plot.binX(
        { y: "mean", filter: null },
        {
          x: "timestamp",
          thresholds: d3.utcMinute,
          y: "temp",
          stroke: "sensor_id"
        }
      )
    )
  ]
})

Here’s a notebook to demonstrate:

1 Like

Thanks a lot Mike,

really love to work with d3 and now with Observable Plot!

Yet, it is the first time that I work with date and time, so thanks for all your hints and even a demonstation notebook!
And I will remember that group should be used for discrete data and bin for continous.
I think the “trick” with filter: null could be added to the documentation, currently it only states By default, empty bins are omitted.

And I should really get used to use the ISO 8601 format instead of our “informal Austria way” of writing dates and times :smiley: