🏠 back to Observable

Bug: Incorrect transliteration of umlauts in slugs

When a title is converted into a slug, umlaut characters get converted into their base characters (e.g. “ö” → “o”). The correct transliteration would be:

  • ä → ae
  • ö → oe
  • ü → ue

Note: “ß” (Eszett) characters get converted correctly to “ss”.

1 Like

Which library is currently used for transliteration? I found the transliteration package to be the most robust (i.e., it would handle more strings out of the box), but it was also the only one that got the umlaut transliteration wrong.

The fun fact about this is that it’s actually correct — in English and (IIRC) Dutch. Those languages call it a dieresis instead of an umlaut, and it’s largely fallen out of use in English. So there isn’t really a good answer for what it should be converted to unless you’re going to have a lookup table of every word that uses an umlaut vs a dieresis.

1 Like

… Yeah, or know the source language. And then there’s different systems/standards, so let’s not even go there. :exploding_head:

Here’s our current approach:

It’s not language-specific, so how well this works depends on whether there’s an unambiguous mapping from Unicode to ASCII. That likely explains why an eszett is converted to a double ess, but a ü shouldn’t necessarily be converted to ue, as in French with argüer. (Edit: oops, I got my Eszetts and Betas confused.)

We’re planning on letting you specify the slug on publish so you can have control over this process.

4 Likes

@mike Something I kept wondering about: Why did you choose to resolve conflicting slugs by adding “/2”, “/3”, etc? (instead of e.g. “-2”)? Doesn’t this imply an additional path segment, may complicate routing and also suggests a relationship between notebooks which might have nothing in common other than their slug?