help Formatting Data to be hierarchical.

Has anyone got any tips on processing data to give it the hierarchical child parent relationships that D3 uses with d3.hierarchy? I wonder if there are tools to assist with the processing and validation for relationships of the data. All tips, questions and comments welcome.

no3_hierarchy = {
 if(nitrateData[nitrateData.length-1].children!='root') {  // add a root parent node
   nitrateData.push({parent: "",
                     geography_type: "New Zealand",
                     geography_name: "Aotearoa" ,
                     children: "root",
                     animal: "",          // not sure if I need these but have them incase structure is needed
                     year: "1990",
                     no3_kg_per_yr: 0
                    })
 } 
 let no3h = nitrateData.filter( d => {
    d.year=''+d.year                              // fix the auto format make the year a string
    if (d.no3_kg_per_yr==-1) d.no3_kg_per_yr =0   // fix up the NA -1 data to read zero as some regions used 0 and others NA
    
    switch(d.geography_name){  // add child parent relationships.
        case 'New Zealand':
          d.children=d.geography_name+d.animal+d.year
          d['parent']='root'
          break;
        case 'North Island':
        case 'South Island':
          d.children=d.geography_name+d.animal+d.year
          d['parent']='New Zealand'+d.animal+d.year
          break;
        case "Auckland":
        case "Bay of Plenty":
        case "Gisborne":
        case "Hawke's Bay":
        case "Manawatu-Whanganui":
        case "Waikato":
        case "Wellington":
        case "Northland":
        case "Taranaki":     
          d.children = d.geography_name+d.animal+d.year
          d.parent = 'North Island'+d.animal+d.year
          break;
        case "Otago":
        case "Southland":
        case "Marlborough":
        case "Tasman":
        case "West Coast":
        case "Canterbury":
          d.children = d.geography_name+d.animal+d.year
          d.parent = 'South Island'+d.animal+d.year
          break;
      }
   return d
  })
  return no3h
}

Hello @hellonearthis . I donā€™t know if I am exactly clear on what youā€™re asking. With regard to manipulating hierarchical data for use in charts, etc, does anything shared in this thread help you? Basically, the discussion relates to using d3 group, roll-up and hierarchy to get data into a format expected by a sunburst chart.

2 Likes

Iā€™m so new to D3 I didnā€™t know about group or roll-up and did a roll my own :smiley: Thanks for pointing me in the right direction.

To be fair, it just entered D3 officially at version 6 (September 2020). Here are the docs:

This capacity massively improved my ability to read through a data set using JS. Thanks to @mbostock and the D3 community!

2 Likes

I am still struggling to get my head around the hierarchical data, itā€™s the adding of the parent child relationships. Been trying to get it all week. And this is how far I have got.

So I have may Tabular CSV that I roll up using.

rolledup = d3.rollup(cans,
                     v => d3.sum(v, d => d.no3 + d.year),
                    d => d.year, d => d.animal, d => d.no3 )

As I currently understand v is like the unique key that is used when making the hierarchy. d are the keys that are used to structure the data into 3 levels, so I have a top level of years with each year contains a level of each animal and they contain a level of the amount of NOĀ³ that cause.

So once I get this rolledup data structured I then parse it into my treemap function which is like doing two things at once.

treemap = data => d3.treemap()
    .size([width, height])
    .padding(1)
    .round(true)
  (d3.hierarchy(data)
      .sum(d => d.value)
      .sort((a, b) => b.value - a.value)) 

First it defines the tree dimensions, which is like setting up the canvas/svg size then the hierarchy is processed. I am guessing the d.value was produced by the rollup(). The hierarchy addā€™s the x y datas for the box drawing.

And then I fall into drawing issues that I think are cause by the data as I donā€™t fully understand how the data is forming and how the hierarchy is defining the box sizes. Having written that out, I guessing the sum from the roll up is key to the treemap size, or is it the hierarchy routine that defines these sizes and I should .sum(d=>d.no3) (rubber ducking via asking questions :slight_smile: udate: changed code to that and then added images )

And so with all these issues. I thought, maybe they could be a notebook that could be parsed data.cvs and it shows the structure of the object in a way that it can be tweaked by adjusts the sum() or the order or number of entries parse to the rollup. As well as customisations, so other entries can be added for naming or titles.

And my messy notebook where Iā€™m trying to figure it out.

Hello again. I read this post and your notebook. I am on my phone so itā€™s hard to text. I apologise for the lack of detail and not trying to work through the problem moreā€¦

Have you seen this? I should have referenced it first:

I didnā€™t see this sort of grouping in your notebook, although I did see the hierarchy.

This might give you a different output closer to what you want:

grouped_data = d3.group(cans, d => d.year, d => d.animal,  d => d.no3)

The groupings work sequentially, and thus a hierarchy of sorts: first year, then animal, then number.

Please forgive if youā€™ve already considered it. It looks to me that you could use the group method as a means to organize your hierarchy in the manner you are after. Iā€™ll try to get to a computer tomorrow to look. Youā€™re clearly more skilled in JS than am I, so not sure if I can help. Glad to see you as an active community member!

1 Like

Ah, sorryā€¦ I managed to check and it looks like this gives you exactly the same output as your hierarchy :frowning: Sorry for the extra noise!

1 Like

Thankā€™s Aaron for looking at the problem, your noise it welcome. group() was also suggested as a way of structuring the data but I think that needed more code turning using cans.map()

It seems like there isnā€™t a simple way to convert the data. But thatā€™s cool as it give me motivation to understand what the structure needed is and how to address it.
And as a result of that I might be able to make a data turning notebook for use with d3.

1 Like

Hi again, @hellonearthis !

Thank you for encouraging continued discussion. And I like your idea for making more notebooks to to help you with data processing!

While I would like to reiterate that I am no expert, and clearly youā€™ve been through all the docs, based on your encouragement Iā€™d like to note one additional thing I learned and to talk about what I think it does as compared to what I read your notes as sayingā€¦ hoping myself also to gain a deeper understanding:

(I added in the commented question from your notebook)

I think in this case, the v (which I assume is just shorthand for value and could be any other character you choose) is an ā€œiterableā€, so itā€™s purpose is simply to direct the function about the number of times to run so that it works through all of your data.
I think that the d is just a shorthand for data (again, could be anything), and this is what tells the rollup what variable to use as its primary grouping, with each subsequent variable mapped to in your sequence creating another layer of your hierarchy.

As for ā€˜convertingā€™ data - I am not quite sure where in this process youā€™re getting stuck (and it looks like you got the tree map working). Is the issue with calculating the size of each grouping?

Also - another notebook that might help you (from a person who helped me a lot, @bayre ):

https://observablehq.com/compare/a1fd3857bac219b0@164...1b6b102e10f3f36d@190

That last example really helped.
My code was bad because I was rolling up my data the passing it through d3.hierarchy and then again through the d3.hierarchy when drawing the chart.
And it seemed to work but gave bad results.

Benā€™s rollup makes more sense but I still need to unpack it to be able to explain itā€™s flow easily.

rolledupB =  d3.rollup(cans, ([v]) => v[value], ...keys.map(k => d => d[k]))

To me it looks like itā€™s doing this:

There are two keys, year and animals and value is that of no3

v ā†’ takes each entry in cans, [v] = {year: ā€œ1990ā€, animal: ā€œBeef cattleā€, no3: ā€œ2660.8ā€}
makes a year object/Map entry at the root level
=====> adds an entry corresponding to the year for Beef Cattle
then
=====> adds an entry corresponding to the year for Dairy Cattle
then
=====> adds an entry corresponding to the year for Deer
then
=====> adds an entry corresponding to the year for Sheep

Then iterates over cans animals for ever year adding the no3 value to each corresponding year.animal entry in the rolledupB

Also I did know that when you use teData = FileAttachment (ā€˜data.csvā€™).csv() that FileAttachment adds a property columns to teData containing the keys. Very useful.

Anyway that has really help me to understand this hierarchy stuff, thanks.

1 Like

Glad youā€™re making progress! And yeahā€¦ That last bit of code uses shorthands that I donā€™t quite follow. When you unpack it, please share :slight_smile:

From https://observablehq.com/d/67ce91e7414b7369

rolledupB = d3.rollup(
  cans,
  ([v]) => v[value],
  ...keys.map(k => d => d[k])
)

Letā€™s break this down:

...keys.map(k => d => d[k])

This takes each key k in the array keys, and constructs from it a function like entry => entry[k] which plucks that key out from an input object. The ellipsis means pass each of those functions in as a separate argument to rollup, so if e.g. keys were ['year', 'animal'], then our code at the top would expand to:

rolledupB = d3.rollup(
  cans,
  ([v]) => v[value],
  d => d.year
  d => d.animal
)

So far what we are doing is looking inside the list ā€˜cansā€™, making a map of years that show up in the objects in the list, then inside making a map of animals that shows up and also match the year we are looking at. All matching objects are stuck into an array which is then passed to the reducer function, resulting in the final value for that map key.

Next, ([v]) => v[value]. What this does is reduces the final output map values (which were arrays), grabbing the first array entry inside, and then pulling the property with the name of whatever value is set to out of it. If we used an identity transformation instead, these aggregated outputs would look like e.g.:

[{
  year: "2001",
  animal: "Beef cattle",
  no3: "0"
}]

In this case this function ([v]) => v[value] turns out to be equivalent to:
(arr) => arr[0].no3 or ([v]) => v.no3. So we could expand our top code to:

rolledupB = d3.rollup(
  cans,
  ([v]) => v.no3,
  d => d.year
  d => d.animal
)

It is not clear to me whether some combinations of year + animal might have multiple rows. If so, you might want to do something in the reducer other than grab the ā€˜no3ā€™ key from the first one (for instance, returning all of the no3 values or combining them. Also, you might want to coerce the no3 value to a different type, e.g. ([v]) => +v.no3 would make it a number.


I would advise against setting value = cans.columns.slice(-1), which is the array ['no3']. When you try to use this array as a property name, Javascript coerces it to the string 'no3', but relying on this type coercion is brittle and confusing. It would be better to say
value = cans.columns.slice(-1)[0] or
value = cans.columns[cans.columns.length - 1]
each of which will be the string 'no3' directly, not wrapped up in an array.

(Would also be clearer to give this variable a more explicit name than value)

2 Likes

Thanks Jacob that break down of the rollup was very useful thanks.
Also those tips are good too, thanks for the help.