How should this be identified then? Just by ranges or something?
Its a while since I came across this, but IIRC there is a separate indic setting in the unicode standard that says something about how it combines, because the rules of indic languages are complex (see comments above ;-)
If that's all we're after, I can keep the normalization step and remove the manual (incomplete?) combining character support, which should still do the right thing™ in the vast majority of cases
Well the NFKC[^1] normalization should handle a lot of cases by itself. The extra combining character support adds the case when there is no pre-combined code point, so it will handle some more cases, but what proportion of additional cases it gets correct I can't say. So better to be simple even if it misses a few cases.
not counting the fact that it's currently terribly broken yet nobody complained before.
Yes, its hardly worth the effort to complicate a capability that appears to be little used, just being safe (ie select proper code points) is enough since it can always be manually overridden if the simple answer is wrong.
[^1]: Why did glib use different names? There is a standard, NFC, NFKC etc so why did they invent their own names? Its a guess what standard name the glib names relate to, as usual its not documented!!! [end rant] Thats why my suggestion of `G_NORMALIZE_ALL_COMPOSE` was so tentative.