[Github-comments] Re: [geany/geany] Fix invalid memory access and Unicode support in utils_get_initials() (PR #3846)

20 Apr 2024


      ...
How should this be identified then? Just by ranges or something?
Its a while since I came across this, but IIRC there is a separate indic setting in the unicode standard that says something about how it combines, because the rules of indic languages are complex (see comments above ;-)
...
If that's all we're after, I can keep the normalization step and remove the manual (incomplete?) combining character support, which should still do the right thing™ in the vast majority of cases
Well the NFKC[^1] normalization should handle a lot of cases by itself.  The extra combining character support adds the case when there is no pre-combined code point, so it will handle some more cases, but what proportion of additional cases it gets correct I can't say.  So better to be simple even if it misses a few cases.
...
not counting the fact that it's currently terribly broken yet nobody complained before.
Yes, its hardly worth the effort to complicate a capability that appears to be little used, just being safe (ie select proper code points) is enough since it can always be manually overridden if the simple answer is wrong.
[^1]: Why did glib use different names?  There is a standard, NFC, NFKC etc so why did they invent their own names?  Its a guess what standard name the glib names relate to, as usual its not documented!!! [end rant] Thats why my suggestion of `G_NORMALIZE_ALL_COMPOSE` was so tentative.
-- 
Reply to this email directly or view it on GitHub:
https://github.com/geany/geany/pull/3846#issuecomment-2067650306
You are receiving this because you are subscribed to this thread.

Message ID: geany/geany/pull/3846/c2067650306@github.com

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

[Github-comments] Re: [geany/geany] Fix invalid memory access and Unicode support in utils_get_initials() (PR #3846)