Keywords in the configuration file.
This: ``` [keywords] primary= Latin ``` - works good.
And that too: ``` [keywords] primary= _Ӕъйя ``` - works good.
But here it is: ``` [keywords] primary= Ӕъйя ``` does not work.
How to make it work? ![1](https://user-images.githubusercontent.com/65322910/81885289-b872e700-95a2-11...) ![2](https://user-images.githubusercontent.com/65322910/81885292-b9a41400-95a2-11...)
What filetype?
Does it work where the first leading character is any character in this subset `[a-zA-Z_]` but not otherwise? I will test it myself once @elextr's question is answered so I can try and reproduce.
The question about the file type is not very clear. Encodings are everywhere UTF-8. This works in any configuration file. For example filetypes.c, or any other. (If you meant it). And yes, it looks like this will only recognize keywords that have [a-zA-Z_] at the beginning. But in the source I did not find any obvious restrictions on this. (Maybe I was looking badly).
OS: Kernel : Linux 5.4.0-29-generic (x86_64) Version : #33-Ubuntu SMP Wed Apr 29 14:32:27 UTC 2020 C Library : GNU C Library / (Ubuntu GLIBC 2.31-0ubuntu9) 2.31 Distribution : Ubuntu 20.04 LTS Geany: (built on or after 2020-03-22) Using GTK+ v3.24.18 and GLib v2.64.2 runtime libraries
@fuckgithubanyway by "filetype" I mean which filetype file did you add the keywords to?
Filetypes are defined by the languages they represent. For languages that have keywords defined in the language, the lists are just to allow new words added in new versions of the standard so the code doesn't have to be changed all the time.
So using your example of C, identifiers (and therefore keywords which are reserved identifiers) are "an arbitrarily long sequence of digits, underscores, lowercase and uppercase Latin letters, and Unicode characters specified using \u and \U escape notation (since C99). A valid identifier must begin with a non-digit character". Its implementation defined if Unicode encodings are allowed for those characters. I suspect the C lexer implementation does not.
It should be noted that for all current versions of C keywords all start with ASCII alphabetics or underscore, so its valid for a lexer to only recognise that.
I understood you. Here is a more accurate answer to your question about file type: I am trying to implement adequate syntax highlighting of an unsupported language in Geany. It is a language with C-like syntax. To do this, I myself created a configuration file according to the instructions. ![1](https://user-images.githubusercontent.com/65322910/81892947-07c21300-95b5-11...) Everything works well except for these [[a-zA-Z _] \ w *] keywords. ![2](https://user-images.githubusercontent.com/65322910/81893029-48ba2780-95b5-11...) In an attempt to solve the problem, I tried to introduce new keywords in the configuration of C, Python and other languages. No utf does not work.
Here is another example of the same effect. There is pure C. ![c](https://user-images.githubusercontent.com/65322910/81893760-f2e67f00-95b6-11...)
I am trying to implement adequate syntax highlighting of an unsupported language in Geany. It is a language with C-like syntax. To do this, I myself created a configuration file according to the instructions.
Well, its better to actually state what you are trying to do when asking for help, and the answer is basically what I gave above, custom filetypes (as you clearly understand from the `styling=C` setting) are based on an existing filetype.
Custom filetypes cannot change the code used by the existing filetype, and so are limited to the extent of any flexibility built into the existing filetype. Some filetypes are not flexible at all, and they are not required to be. They were written to address a particular language and cannot reasonably be expected to anticipate future uses (some may say abuses :).
Note that the `keyword` list only applys to the highlighting lexers, the symbol parsers ignore that list since they have no idea what the keywords mean.
The symbol parsers must parse the actual language for symbol definitions (such as the typedef you show in C above) they will likely be even less flexible than the highlighting lexers are since they need to know semantics (ie that `typedef` defines types).
As I noted on the previous post, C is only required to accept identifiers using the portable character set (basically ASCII) and the symbol parser for C only recognises identifiers using those characters.
Thank you for your responses. I’ll clarify: this does not work not only with C and C similar settings. You can do the same with Python settings, etc. This does not work in principle. Geany does not recognize keywords starting with arbitrary UTF. Tell me, please, is there any chance to fix this somehow?
Most programming languages do not support identifiers with arbitrary bytes, some do support it, but Scintilla lexers are a small subset of that. You could open bug reports for Scintilla lexers which should support arbitrary identifier characters which don't.
The Python lexer works if you set the `lexer.python.unicode.identifiers=1` lexer property. For example a `def Ü():` highlights `Ü` properly.
But the symbol parser or symbol handling doesn't and spews `Warning: : ignoring null tag in /tmp/untitled.py(line: 29)` errors from `ctags/main/entry.c` so I guess the symbol parser system doesn't support it yet. As @codebrainz said, you can open issues on the universal ctags project that supplys the parsers (and a pull request to update Geany :)
Thank you guys for the answers.
github-comments@lists.geany.org