"Invalid character" for UTF-16 characters on Safari

Seems like Babel doesn’t handle some characters properly on Safari.
I wanted to use one of the UTF-16 mathematical “tau” letters (e.g. U+1D6D5) as a name for a cell that represent the only proper circle constant (troll intended :slight_smile: ), and the notebook breaks in Safari:

Note that:

  • It only breaks when using the value, not when declaring it
  • It doesn’t break when using an UTF-8 version of the letter (e.g. U+03C4)
  • It doesn’t break in Chrome

I guess you meant to link here:

The character you use in the notebook is actually 𝞃 U+1D783. Interestingly, the error message I see in Safari is this:
SyntaxError: Invalid character '\ud835'

I get the same message when I replace both instances of U+1D783 with 𝛕 U+1D6D5, which makes sense since the hex encoding for both characters in big-ended UTF-16 starts with D835.

The Observable compiler doesn’t use Babel and only applies a minimal transformation to the code in cells (for the most part, just wrapping them in a function).

The variable name “𝞃” (MATHEMATICAL SANS-SERIF BOLD SMALL TAU) is valid in ES6, but not ES5, which might explain why Safari doesn’t like it. The variable name “τ” (GREEK SMALL LETTER TAU) does however work in Safari.

I recommend this site for validating variable names:

https://mothereff.in/js-variables

For a few years, D3 used Greek variable names in d3-geo and d3-geo-projection to try to improve the readability of mathematical functions. Whether or not this succeeded is of course subjective (e.g.). However, we abandoned this approach and I don’t recommend it because it’s typically not worth the trouble; it makes it harder for others to edit. Many readers won’t have a Greek alternate keyboard configured.

3 Likes

Well I learned a few interesting things here… Thanks for the additional information Mike.

A secondary issue was that people (in some cases(?)) had to be explicity in serving the js with an utf-8 charset header.

Please don’t use the obscure “MATHEMATICAL blahblah” code points if you can help it (even in prose/formulas; not just talking about code here). They have poor font support and show up as boxes on many common devices. Stick to the regular Greek letter code points.

Yep, but we solved the encoding issue in D3 by converting to ASCII. And at least as far as Observable’s concerned, your code will always be served UTF-8.