Your references (1,2) were extremely helpful in understanding what exactly is going on; Thank You!
The ’A few words about table expressions and op…' section in Introducing Arquero really explains the issue.
Unfortunately, for my use case, I’m dynamically creating new columns/attributes for the rollup-table and can’t reference columns/attributes that are being generated in the rollup as parameters for the op.corr method.
The params method/pattern is only applicable for calculating values as a function of the current table row being processed. It is a similar scenario to what @bmschmidt explained in this notebook.
So my use of the eval appears to be the only option so far for my specific situation.
I think the reason this is difficult is that the data columns are not tidy. If you fold the data into a long format first, the cross-product correlation is relatively straightforward in arquero: the only weird bit is a custom join function to avoid duplicating keys on the left and right.
const long = aq
.from(data)
.fold(aq.not("Date"), {"as": ["company", "price"]})
return long
.join(long, (a, b) => op.equal(a.Date, b.Date) && a.company < b.company)
.groupby("company_1", "company_2")
.rollup(
{correlation: op.corr("price_1", "price_2")})
.orderby(aq.desc("correlation"))
.view()
Thanks for your elegant approach, but it appears to only apply for the use case where one knows ahead of time exactly which data columns are going to participate in the correlation and what uniquely identifies (keys (verb)) each row…in this case ‘Date’.
Knowing that ‘Date’ is the row identifier allows your logic to negate/remove it from the fold and also use it in the custom join.
The generalized approach is data-driven and dimension-independent. It uses the data profile to determine which columns are measures and then dynamically proceeds with the correlation analyses.
If the data has anything that is being ‘measured’ it will correlate against generic metric_1, metric_2 pairings and then rank those correlations. The stock data was used a convenient data source to illustrate that analysis pattern.
I added a ‘Usage’ section to the notebook.
Here’s how you would use correlationAnalysis against the ‘beers’ data from the Arquero introduction:
import {beers} from “@uwdata/introducing-arquero”
beersData = beers.objects();
import {correlationAnalysis} with {beersData as data} from “@mariodelgadosr/dow-jones-industrial-average-correlation-analysis-with-ar”
…At first glance table expressions look like normal JavaScript functions… but hold on! Under the hood, Arquero takes a set of function definitions, maps them to strings, then parses, rewrites, and compiles them to efficiently manage data internally…
With this is mind, the section of Limitations has this critical recommendation:
…Alternatively, for programmatic generation of table expressions one can fallback to generating a string – rather than a proper function definition – and use that instead…
The string inside the eval method can be passed directly as a Table expression. See the solution implemented in cell rollupObj.
Be careful to follow the exact string formatting requirements!:
…using an identifier other than d will fail. In contrast, with an explicit function definition you are free to rename the argument as you see fit…