re-keying sliced data?

I continue to struggle with slice and splice, and would appreciate any help in better understanding if and how I can use them to reset my data keys.

Several months ago, @mootari and @bgchen patiently helped me to better understand these methods, but still I haven’t managed to use them successfully to ‘ignore’ header data and to re-set my keys to start on an arbitrary row.

Here’s how my data come to me:

Note that in the top red box are a few bits of provenience data describing the table. I want to just ignore these and to start my table keys with the ‘bottom’ red box.

I’ve tried a number of ways of slicing and splicing, but no matter what, I can’t seem to actually cut/ignore the keep ending up with something like this:

Whereas I am after something like this (which I am currently able to achieve by manually cutting the header/provenience data from the source CSV):

Are slice and splice the correct tools?

I’ve been banging my head against the wall on this and searching the Internet, but am making no progress.

Here’s a link to a notebook with my source data loaded in for reference:

I would be very grateful for your help and guidance! :pray:

1 Like

I guess that your file attachment is just not quite yet ready for the .csv method so you might call the .text method instead. You can then chop the lines you need to chop off the front and pass the result to d3.csvParse. Here’s an illustration:

One potential problem is that the technique depends on the assumption that you know how many lines you need to chop. Alternatively, you might consider searching for the expected header line. I think you’ve got to know something about the file, though.

1 Like

Thanks @mcmcclur!

I am fortunate that the CSV files are consistent in the number of header rows that need to be chopped.

Thanks for your code, which clearly works! A couple of questions, if you don’t mind (following the ‘rubber duck’ method recommended by @mootari and @maliky :

parsed = {
  let idx = 0;  // we're setting the initial index value to 0
  for(let i = 0; i<4; i++) {  // we're starting an iterable value at zero, 
                             //and running a function on all iterables
                             // from the 4th position to the end
    idx = H01P_text.indexOf('\n', idx+1)   // the function re-defines the index by taking our attached file,
                                           // and reading in a new set of index values
                                           // by searching for line breaks (\n)
                                           // from the index (0) + 1  ???
  }
  return d3.csvParse(H01P_text.slice(idx+1)) // and now we're returning the a copy of the array
                                            // using the index (0) + 1  ??? 
}

… I am not exactly following how you’re re-defining the index in this example, particularly with regard to settings idx+1 in both the .indexOf and return sections. Could you help to clarify a bit further?

Also, this seem like a lot of work just to effectively ‘ignore’ a few header rows from a CSV. I think I understand now that by calling the CSV method, my data are being interpreted into an array that is drawing its ‘shape’ (so to speak) from the first ‘key’ values… so that slicing them out won’t work (it’s assumes the shape is correct). I am sorta surprised that splice doesn’t do the trick, however, as it alters the original array and allows me to ‘pull out’ from it a section of the data. Yet despite being able to effectively ‘delete’ these values, they still exist as object keys. :frowning:

Is there any more simple solution?

Perhaps the following experiment might help clarify:

{
  let str = 'Mississippi';
  let firstOccurrence = str.indexOf('i', 0); // The 0 is optional
  let secondOccurrence = str.indexOf('i', firstOccurrence + 1); // Need that firstOccurrence index!
  let the_rest = str.slice(secondOccurrence);
  return [firstOccurrence, secondOccurrence, the_rest];
}
// Output:
// [1,4,'issippi']

Note that the second argument of String.indexOf indicates the starting location of the search so you’ve got to add one to avoid just re-finding the previous match.


I think it would probably be simpler to just search for the header row you expect, or even just the first few terms. In your case, I guess that

match_header_parse = d3.csvParse(
  H01P_text.slice(H01P_text.indexOf('id,vdc_municipality'))
)

seems to work. The problem with this technique is that the previous lines could conceivably contain that text. In either case, you’ve really got to have some knowledge about the structure of the file.

1 Like

Thanks Mark!

I appreciate your help both to understand the mechanics of how the index is built and searched, as well as for this elegant solution:

I appreciate also your cautionary note on the ‘elegant workaround’. For this particular problem, it works like a charm! Thank you!

1 Like