Creating an Observable Tree Plot from stratified cvs data

Notebook

Sample Observable Notebook

Issue

I believe I have been able to correctly stratify my csv dataset. It seems like it has the correct structure as manually traversing the stratified data resembles the below Intended tree hierarchy, however, when I throw the stratified data into an Observable Tree Plot, it shows a different structure. It also just displays [object Object] instead of BA.1.1.1.2 (my 1st step) or BC.2 (my 2nd step)

Intended Tree Hierarchy

This graph was generated using the path data format. This path format no longer suits my needs and I need to expand on what I can output.

BA
BA|BA.1
BA|BA.1|BA.1.1
BA|BA.1|BA.1.1|BA.1.1.1
BA|BA.1|BA.1.1|BA.1.1.1|BA.1.1.1.1
BA|BA.1|BA.1.1|BA.1.1.1|BA.1.1.1.2
BA|BA.1|BA.1.1|BA.1.1.2

OR

BA
BA|BA.1
BA|BA.1|BA.1.1
BA|BA.1|BA.1.1|BA.1.1.1
BA|BA.1|BA.1.1|BA.1.1.1|BC.1
BA|BA.1|BA.1.1|BA.1.1.1|BC.2
BA|BA.1|BA.1.1|BA.1.1.2

Actual Tree Hierarchy

This is the output that I get when I use Observable’s Plot Tree

You don’t need to build the tree yourself. Plot.tree takes tidy data (like all marks in Plot). So you can pass your csv directly to Plot.tree, and then tell it which column specifies the path (partial_alias_pango) and which delimiter you want (.). Like so:

Plot.plot({
  axis: null,
  inset: 10,
  insetRight: 120,
  height: 180,
  marks: Plot.tree(csv, {path: "partial_alias_pango", delimiter: ".", text: "node:path"})
})
1 Like

Oh sorry, the delimited strategy is what I currently have implemented and that no longer is enough for me. I need the ability to do more.

Basically I am moving from the delimited strategy that you suggested to the d3.stratify strategy. I have successfully stratified my data (I believe) but I cannot get the graph to display (as seen in the notebook). I have also updated my notebook to hopefully make this more clear.

In what way do you need to do more? There’s nothing that can be expressed using d3.stratify / d3.hierarchy that can’t equivalently be expressed as a tidy array of nodes with a delimited path.

If you want to convert your d3.hierarchy back into an array of nodes for Plot, you can call root.descendants to get an array like so:

Plot.plot({
  axis: null,
  inset: 10,
  insetRight: 120,
  height: 200,
  marks: Plot.tree(stratified_data.descendants(), {path: "id", delimiter: "."})
})

But I don’t see any reason to use d3.stratify when you’re immediately undoing that work so you can pass the data back to Plot. Maybe you could elaborate on what you’re trying to do?

[EDIT] Note that for my new stratify strategy, my data is not delimited at all.

I plan on modifying the chart paths and nodes. I am not fully sure exactly how I will go about doing that right now, but I don’t see a way forward using the existing delimited path strategy. Some high-level ideas I have are to somehow prune or hide nodes/branches, and possibly be able to emphasize more newly added nodes/branches.

I see the existing delimited path strategy as being fairly static with my current understanding, and that is why I am trying to find a new way forward. With the stratified strategy, I have already been able to add additional data into each node (pango, partial_alias_pango, unaliased_pango, designated_date), and my current idea is that by adding more data, I can do some of the more dynamic aspects.

The full dataset as a tree (over 700 nodes) cannot be downloaded and rendered correctly on social media websites like Twitter. Even when breaking my dataset into sub-branches (BA.1, BA.2, BA.3, BA.4, BA.5, and Recombinants), some of the trees (BA.2 and BA.5) are still rendered incorrectly. Without the content being able to be rendered correctly on 3rd-party websites, my current approach is kind of pointless. The issue is that all uploaded pictures are scaled down, which makes the 200-300 node’s text very difficult to read.

I wouldn’t mind if my assumptions were wrong and I can still do everything with the delimited path strategy. However, for future use, I would like to know how to overcome my new issue as stated above. Future projects might be better suited for stratifying verses path delimited.

While this does appear to create the same hierarchy, this data isn’t delimited at all. The dots are a part of the name that needs to be displayed.

In my previous implementation, I had created a delimited format that could be easily read into Plot.tree(...). Here is the delimited content I had created before (notice the pipe is the delimiter):

BA
BA|BA.1
BA|BA.1|BA.1.1
BA|BA.1|BA.1.1|BA.1.1.1
BA|BA.1|BA.1.1|BA.1.1.1|BC.1
BA|BA.1|BA.1.1|BA.1.1.1|BC.2
BA|BA.1|BA.1.1|BA.1.1.2

And here is the implementation that I created (but for the BA.4 sublineages since it fits nicely in a small screenshot):

TreePlot = function (data, column, { width = 300, height = 1000 } = {}) {
  return Plot.plot({
    axis: null,
    inset: 10,
    insetRight: 50,
    width: width,
    height: height,
    marks: [
      Plot.tree(aq.from(data).select(column), {
        path: column,
        delimiter: "|",
        treeAnchor: "left"
      })
    ]
  });
}

With the original csv data (csv_data), I set the delimiter to be , and the path to be partial_alias_pango, and that brings up a no root error. I don’t believe this is what you meant, though. I believe you meant the actual parsed csv (csv). However, csv doesn’t have any kind of delimited mark.

With the parsed csv (csv), I have to programmatically set the parent, because this data is not explicitly set in the csv data. Basically, the parent of BA.x.y.z is BA.x.y (can only determine parent via unaliased_pango or partial_alias_pango data)

Regardless of what I plan on doing, I just want to know how to get rid of [object Object]. The stratified data appears to match what it should based on my dataset.