[Github-comments] Re: [geany/geany] ctags: Matlab: generate tag with only the function name, not the function name plus arguments (PR #3358)

3 Jan 2023


      ...
Regarding the regex ctags version of the parser vs Geany's version of the parser: we could use the regex ctags version, I was just thinking that since we have the hand-written version in Geany already, it might be a base for a hand-written parser that could eventually be submitted upstream so I kept the `geany_` parser. Hand-written parsers tend to offer better flexibility in parsing and are much faster than regex parsers. But before such a parser could be submitted upstream, it would have to offer all the functionality the current regex parser offers.
I think the custom parser is currently a bit messy and could use some restructuring, but your point is valid.
Right now, I think the only real advantage of the regex version of the parser is its readability, but it doesn't seem to be really leveraging the full power of regexes -- it is rather simple and can probably be translated to "plain C" easily.
(Also, at first glance, it seems that those regexes aren't too good either; for example, I believe the second one will match `functionality = 42` too.)
Re: speed, I see four "levels" in which the parser can be implemented:
1. Compare characters one by one
2. `strncmp()` and `strstr()`
3. **`sscanf()`** :bulb:
4. Regular expressions
I think the regex parser could be easily re-implemented using sscanf() as a faster alternative to regex, so if that's an option I think it'd be an elegant solution -- readable, efficient, and less prone to errors than options 1 and 2.
Something like
```c
if (sscanf(line, " function [%*[^]%]] = %[A-Za-z0-9_]", buffer) == 1) ...
if (sscanf(line, " function%*[ \t]%*[A-Za-z0-9_] = %[A-Za-z0-9_]", buffer) == 1) ...
```
etc.
(where `" "` matches zero or more whitespace chars, `"%*[ \t]"` matches one or more spaces/tabs, `"%*[^]%]"` matches anything but `]` and `%`, etc -- it's not incredibly readable, but it's fast.)
So, what do you think?  Would `sscanf()` be fast enough, or better to keep matching individual chars and substrings?
...
Probably could be done by checking if after
p=(const unsigned char*) strstr ((const char*) line, "struct");

`p-1` and `p+6` are not alnum (plus all the necessary range checks).
That still won't ignore words in strings (and maybe other corner cases).
Honestly I think I'd entirely ditch parsing structs; knowing that a certain variable at a certain point in the program is a struct isn't really that relevant, and universal-ctags doesn't do it anyway.  Class parsing would be more useful.
For a similar reason, I'd avoid parsing all variables as universal-ctags does; having a list with EVERY variable assignment in EVERY function in the script seems excessive.  (However, it might be a good idea to list `global` and `persistent` variables.)
-- 
Reply to this email directly or view it on GitHub:
https://github.com/geany/geany/pull/3358#issuecomment-1369307109
You are receiving this because you are subscribed to this thread.

Message ID: geany/geany/pull/3358/c1369307109@github.com

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

[Github-comments] Re: [geany/geany] ctags: Matlab: generate tag with only the function name, not the function name plus arguments (PR #3358)