Converting Strings to number (date)

Hi All! I am working on a dataset looking at Top Albums Genres over time.

The problem i’m having is converting the “rel date” from a string to number/date. I’v tried the ds.timeParse but it’s not helping.

for example I have date dating back to 1966 and its converting that date to “2066” :sweat_smile:

I’m still getting to grips of Obserable. Your help will be much appreciated thanks :slight_smile:

This is one of the major problems with two digit years. If it’s possible to get the data with a better date format, I’d recommend it. But given this data, there is something we can do.

To make this problem reasonable, we can assume that all dates are in the past 100 years. So if we find a date in the future, we simply subtract 100 years from it, and it should get back into range.

The heart of that idea is this modification to your data loading cell:

top5000 = {
  let data = await FileAttachment("Top5000.csv").csv();
  let now = new Date();
  return => {
    let rel_date = parser(d.rel_date);
    if (rel_date > now) rel_date.setFullYear(rel_date.getFullYear() - 100);
    return { ...d, rel_date };

You can see it integrated in with a full notebook here:


Thank you for all your help.

Hi Again I have another problem which i thought i sorted but I haven’t, I would love to get your help again :sweat_smile:

In the data, each object has multiple genres (gen). I want to limite the “gen” to one for each artist in order to make the data easier to work with.

My solution: Turn the “gen” property string into array using the foreach & split() method. I have been successful for a small section of the data ( gens: “Conscious Hip Hop”)

But it hasn’t gone to plan when using this method for the whole data set :sweat:

Hope this makes sense!

I’d recommend editing the top5000 cell to do that work. You can modify the body of the map function there like this to take the first genre:

let rel_date = parser(d.rel_date);
if (rel_date > now) rel_date.setFullYear(rel_date.getFullYear() - 100);
let genre = d.gens.split(', ')[0];
return { ...d, rel_date, genre };

In general, I find that forEach is almost never what I want. It’s usually more convenient to use map if you want to produce a modified array, or for-of loops if you just want to iterate.

1 Like

Thank you again!

I’m just used to loops never used map before. Looking back, using a loops doesnt really make sense, but thank you again!

Im back for the nth time :persevere: :persevere:!

It’s the same dataset. My issue is there are too many values under the “gens” column. For example, the genre “Rock” has soo many sub-genres.

How do I consolidate the data so there are clear genres without the subs. I have already started do it manually, but is there a way to fast track this process?

If not, I will continue or find an easier data set as I will have to go through 2744 columns :sweat_smile:

Thank you very much!