🏠 back to Observable

Weighted rolling average with Arquero

I know that you can calculate a moving average with Arquero like so:

data.derive({
  sevenDayAvg: aq.rolling(d => d.average(d.dataField), [-6, 0])
})

but is there a way to calculate a weighted average where points are given less weight when they are further from target point (for example a gaussian kernel)?

Not arquero, but array-blur is a (fast) approximation of a gaussian kernel applied on 1-d or 2-d data GitHub - Fil/array-blur: blur an array of numbers in 1 or 2 dimensions

3 Likes

Oh, nice! Thank you!

Okay, to answer my own question I’ve written an arquero window function that I believe does gaussian kernel density estimation. My stats is a little rusty, so I’d welcome any corrections in what I’ve done here!

normalDistributionGenerator = (location = 0, scale = 1) => {
  return (x) => {
    const left = 1 / (scale * Math.sqrt(2 * Math.PI))
    const rightTop = -((x - location) ** 2)
    const rightBottom = 2 * scale ** 2
    return left * Math.exp(rightTop / rightBottom)
  }
}

aq.addWindowFunction(
  'kde',
  {
    create: (scale = 1, distributionGenerator = normalDistributionGenerator) => ({
      init: state => state,
      value: (w, f) => {
        const normal = distributionGenerator(w.index, scale)
        return d3.range(w.i0, w.i1).reduce((acc, i) => acc + w.value(i, f) * normal(i))
      },
    }),
    param: [1, 2],
  },
  { override: true }
)

Usage:

data.derive({
  newCasesKDE: aq.rolling(d => aq.kde(d['actuals.newCases'], 7), [-Math.Infinity, Math.Infinity])
}),

Here it is in action

1 Like

I’ve made an example with array-blur here Time Series Data Smoothing / Fil / Observable

you can see that there is a tiny difference in the last few days, where the KDE has a sharp drops, because it thinks that there are zeroes to the right of the last values — array-blur takes the boundary into consideration.

Awesome! That’s super cool. Yeah I noticed that drop at the end and wasn’t sure how that’s typically accounted for. Does array-blur adjust the amplitude of the kernel according to how close to the end it is (or is the calculation totally different from that? I admit I haven’t even looked at the code :grimacing:)

In array-blur we clamp the index to the range; in other words when you reach the end of the data vector, the missing values are taken to be the last value instead of 0. (And similarly on the left-hand side, the missing values “before 0” are v[0] instead of being 0.) It happens here array-blur/blur.js at master · Fil/array-blur · GitHub

1 Like

Oh, that makes sense! I think I can do that with arquero too. Going to give it a try.

1 Like