Monday 8 September 2008

U+2060: Word Joiner

Some scripts have places that are naturally a word boundary, but should not be for some particular instance. For that purpose Unicode provides the WJ (Word Joiner) special character at codepoint U+2060 which you insert wherever you don't want a word boundary:

encoding formEndiannessS0S1S2S3
UTF-32be0x000x000x200x60
le0x600x200x000x00
UTF-16be0x200x60
le0x600x20
UTF-80xE20x810xA0


But you'd be surprised how few programs support this vital unicode feature. Try getting Japanese to wrap between words in a .NET Winforms control.