[geany/geany] UTF keywords highlighting (#2504)

List overview All Threads

newer

older

[geany/geany-plugins] codenav:...

[geany/geany-plugins] Markdown:...

fuckgithubanyway

14 May 2020 14 May '20

4:21 a.m.

Keywords in the configuration file.

This: ``` [keywords] primary= Latin ``` - works good.

And that too: ``` [keywords] primary= _Ӕъйя ``` - works good.

But here it is: ``` [keywords] primary= Ӕъйя ``` does not work.

How to make it work? ![1](https://user-images.githubusercontent.com/65322910/81885289-b872e700-95a2-11...) ![2](https://user-images.githubusercontent.com/65322910/81885292-b9a41400-95a2-11...)

-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/geany/geany/issues/2504

Attachments:

attachment.htm (text/html — 2.0 KB)

Show replies by date

elextr

14 May 14 May

4:55 a.m.

What filetype?

-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/geany/geany/issues/2504#issuecomment-628355680

Matthew Brush

5:05 a.m.

Does it work where the first leading character is any character in this subset `[a-zA-Z_]` but not otherwise? I will test it myself once @elextr's question is answered so I can try and reproduce.

-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/geany/geany/issues/2504#issuecomment-628358301

fuckgithubanyway

5:16 a.m.

The question about the file type is not very clear. Encodings are everywhere UTF-8. This works in any configuration file. For example filetypes.c, or any other. (If you meant it). And yes, it looks like this will only recognize keywords that have [a-zA-Z_] at the beginning. But in the source I did not find any obvious restrictions on this. (Maybe I was looking badly).

-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/geany/geany/issues/2504#issuecomment-628361507

fuckgithubanyway

5:20 a.m.

OS: Kernel : Linux 5.4.0-29-generic (x86_64) Version : #33-Ubuntu SMP Wed Apr 29 14:32:27 UTC 2020 C Library : GNU C Library / (Ubuntu GLIBC 2.31-0ubuntu9) 2.31 Distribution : Ubuntu 20.04 LTS Geany: (built on or after 2020-03-22) Using GTK+ v3.24.18 and GLib v2.64.2 runtime libraries

-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/geany/geany/issues/2504#issuecomment-628362332

elextr

6:15 a.m.

@fuckgithubanyway by "filetype" I mean which filetype file did you add the keywords to?

Filetypes are defined by the languages they represent. For languages that have keywords defined in the language, the lists are just to allow new words added in new versions of the standard so the code doesn't have to be changed all the time.

So using your example of C, identifiers (and therefore keywords which are reserved identifiers) are "an arbitrarily long sequence of digits, underscores, lowercase and uppercase Latin letters, and Unicode characters specified using \u and \U escape notation (since C99). A valid identifier must begin with a non-digit character". Its implementation defined if Unicode encodings are allowed for those characters. I suspect the C lexer implementation does not.

-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/geany/geany/issues/2504#issuecomment-628375782

elextr

6:16 a.m.

It should be noted that for all current versions of C keywords all start with ASCII alphabetics or underscore, so its valid for a lexer to only recognise that.

-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/geany/geany/issues/2504#issuecomment-628376154

fuckgithubanyway

6:42 a.m.

I understood you. Here is a more accurate answer to your question about file type: I am trying to implement adequate syntax highlighting of an unsupported language in Geany. It is a language with C-like syntax. To do this, I myself created a configuration file according to the instructions. ![1](https://user-images.githubusercontent.com/65322910/81892947-07c21300-95b5-11...) Everything works well except for these [[a-zA-Z _] \ w *] keywords. ![2](https://user-images.githubusercontent.com/65322910/81893029-48ba2780-95b5-11...) In an attempt to solve the problem, I tried to introduce new keywords in the configuration of C, Python and other languages. No utf does not work.

-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/geany/geany/issues/2504#issuecomment-628382761

fuckgithubanyway

6:46 a.m.

Here is another example of the same effect. There is pure C. ![c](https://user-images.githubusercontent.com/65322910/81893760-f2e67f00-95b6-11...)

-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/geany/geany/issues/2504#issuecomment-628383644

elextr

7:52 a.m.

...

I am trying to implement adequate syntax highlighting of an unsupported language in Geany. It is a language with C-like syntax. To do this, I myself created a configuration file according to the instructions.

Well, its better to actually state what you are trying to do when asking for help, and the answer is basically what I gave above, custom filetypes (as you clearly understand from the `styling=C` setting) are based on an existing filetype.

Custom filetypes cannot change the code used by the existing filetype, and so are limited to the extent of any flexibility built into the existing filetype. Some filetypes are not flexible at all, and they are not required to be. They were written to address a particular language and cannot reasonably be expected to anticipate future uses (some may say abuses :).

Note that the `keyword` list only applys to the highlighting lexers, the symbol parsers ignore that list since they have no idea what the keywords mean.

The symbol parsers must parse the actual language for symbol definitions (such as the typedef you show in C above) they will likely be even less flexible than the highlighting lexers are since they need to know semantics (ie that `typedef` defines types).

As I noted on the previous post, C is only required to accept identifiers using the portable character set (basically ASCII) and the symbol parser for C only recognises identifiers using those characters.

-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/geany/geany/issues/2504#issuecomment-628402893

fuckgithubanyway

8:02 a.m.

Thank you for your responses. I’ll clarify: this does not work not only with C and C similar settings. You can do the same with Python settings, etc. This does not work in principle. Geany does not recognize keywords starting with arbitrary UTF. Tell me, please, is there any chance to fix this somehow?

-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/geany/geany/issues/2504#issuecomment-628406276

Matthew Brush

8:23 a.m.

Most programming languages do not support identifiers with arbitrary bytes, some do support it, but Scintilla lexers are a small subset of that. You could open bug reports for Scintilla lexers which should support arbitrary identifier characters which don't.

-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/geany/geany/issues/2504#issuecomment-628415097

elextr

8:34 a.m.

The Python lexer works if you set the `lexer.python.unicode.identifiers=1` lexer property. For example a `def Ü():` highlights `Ü` properly.

But the symbol parser or symbol handling doesn't and spews `Warning: : ignoring null tag in /tmp/untitled.py(line: 29)` errors from `ctags/main/entry.c` so I guess the symbol parser system doesn't support it yet. As @codebrainz said, you can open issues on the universal ctags project that supplys the parsers (and a pull request to update Geany :)

-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/geany/geany/issues/2504#issuecomment-628420020

fuckgithubanyway

9:12 a.m.

Thank you guys for the answers.

-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/geany/geany/issues/2504#issuecomment-628436854

1685

Age (days ago)

1685

Last active (days ago)

github-comments@lists.geany.org

13 comments

3 participants

tags (0)

participants (3)

elextr
fuckgithubanyway
Matthew Brush