clacke@libranet.de ❌

Almost as difficult as off-by-one and cache invalidation

clacke@libranet.de ❌ at

The most complicated thing about software development is people. And if you remove all the social and process-oriented things in software development, the most complicated thing is still people, but this time their artifacts in the problem space. And of those artifacts, the most complicated one is probably human language, and the effects of human language that almost any piece of software you develop will have you suffer from is string processing.

This was a very enlightening piece to me. I know (well, to the extent that one can know without being knee-deep in linguistics … and also limited by the extent to which they are knowable) what graphemes are. But I didn't know that there is a well-defined and useful[1] concept “grapheme cluster” in Unicode (so well-defined it's defined at least twice! :-D), and I knew even less that there's even a language that uses them as its core abstraction for strings in its standard library. Good job, Swift!

Well, if Firefox is already Doing The Right Thing when you select text, it would be strange if they didn't end up in the Rust standard library again, wouldn't it?[2]

I wonder if there are any regex command line tools out there that support them. If there isn't, Swift, or Rust, supported by that non-core library, would seem to be the simplest language to write one in. Maybe it will come to ripgrep (which is written in Rust) at some point?

http://manishearth.github.io/blog/2017/01/14/stop-ascribing-meaning-to-unicode-code-points/

/via
https://lobste.rs/s/xx01z7/let_s_stop_ascribing_meaning_code_points

Further discussion with great comments at https://www.reddit.com/r/rust/comments/5o13kk/lets_stop_ascribing_meaning_to_code_points/.

[1] … or is it? this redditor says it's more complicated than that. But of course it is.

[2] As this lobstror (?) points out, maybe keeping it out of core is actually necessary, because deep bowels of Unicode are a moving target.

AJ Jordan likes this.