[Geany-devel] Indentation using regex (was [PATCH 14/19] Rewrite tab switching queue)

Jiří Techet techet at xxxxx
Tue Dec 6 16:38:57 UTC 2011


On Tue, Dec 6, 2011 at 07:20, Lex Trotman <elextr at gmail.com> wrote:
> [...]
>
>> First, note that I wasn't able to find the patch, so I'm only guessing
>> from reading the thread and from my own (much less complete) attempt.
>>
>
> I'm afraid that if I had the patch it is on my broken hard drive :-S
>
> And anyway we never got it to work satisfactorily.
>
>>
>> So.  This looks pretty good for line-based indents (but not brace match
>> :(), but I ran into a really annoying problem with SH:
>>
>> SH should indent after "then", "do", etc. and unindent "fi", "esac",
>> etc.  The problem is that you expect the "fi" line to be unindented
>> (e.g. use unindent_this_line), but if you type "file" for example, it'd
>> wrongly unindent that line too!
>>
>> I thought about unindenting the previous line when entering the \n, but
>> this isn't a real solution either since re-adding a newline after a well
>> indented line would unindent it again.  Crap.
>>
>> So I haven't yet found a sensible solution for this problem -- which
>> wouldn't apply for '}' since it's very unlikely it's part of a bigger
>> word -- and would like to know it anybody got super clever ideas, or how
>> other editors you know handle this.
>>
>> This said, I really like the idea of configurable indentation rules that
>> could handle languages like SH, Pascal, Ruby, Ada, etc. without the need
>> to hard-code it.
>>
>
> WARNING, complex topic, big post :)
>
> Quick summary of ones I know:
>
> Emacs has language specific elisp, for C:
>
> "It analyzes the line to determine its syntactic symbol(s) (the kind
> of language construct it's looking at) and its anchor position (the
> position earlier in the file that CC Mode will indent the line
> relative to). The anchor position might be the location of an opening
> brace in the previous line, for example. See Syntactic Analysis.
> It looks up the syntactic symbol(s) in the configuration to get the
> corresponding offset(s). The symbol +, which means “indent this line
> one more level” is a typical offset. CC Mode then applies these
> offset(s) to the anchor position, giving the indentation for the line.
> The different sorts of offsets are described in c-offsets-alist. "
>
> And it admits that even then it gets it wrong sometimes :(
>
> Eclipse and Netbeans also use parser results for the indent guidance.
>
> I don't think parsing the source for indent guidance is in the Geany
> light and small spirit, so I reject that.
>
> Instead I propose the following "correct most of the time" but simple
> option based on a combination of Jiri's and Emacs' methods:
>
> 1. Each line N has an initial indentation which is the indentation of
> line N-1 plus the increments/decrements for all matches to "indent
> next line" regexes that occur in liine N-1.  (Note that each regex has
> a signed count of columns to indent/exdent)

Maybe I don't understand it correctly but does this mean that if you
open an existing file, you'd re-indent it completely based on the
regexes? I don't think this is a good idea because this could lead to
whitespace change in every line when you edit just a single line.

Or does it mean to have these indent numbers just internally and use
them only when when auto-indentation is done? I often work with files
edited by many people over many years which have inconsistent indents.
Imagine the correct indent size is 4 but someone used just 2 indents
in the outer "if" block. If I insert new "if" inside this block, the
indent size will be 6 because of the incorrect outer indent. This is
exactly why I used the "delta" indent solution to be locally correct
and have minimal impact on (and be minimally affected by) the rest of
the code.

One more thing - with global indents you have to be sure that the
regexes catch all the indentation cases (without false positives)
otherwise one error will affect the indentation everywhere in the rest
of the file. You can do crazy stuff with some languages so I can
imagine such a thing can happen easily (single line with end of
multi-line comment followed by end block followed by another comment).
With delta indentation it's much less critical - the indent may be
incorrect for the next line but this won't affect the rest of the file
in any negative way. Moreover, you usually don't do things like the
comment example when you write the code and when you need
auto-indentation; you usually add them afterwards when no
autoindentation is needed.

Final remark - better not to auto-indent at all than to indent
incorrectly. There's nothing worse than an editor (actually anything)
which tries to be smart in an annoying way.

Cheers,
Jiri



More information about the Devel mailing list