[Geany-devel] Indentation using regex (was [PATCH 14/19] Rewrite tab switching queue)

Tue Dec 6 22:20:51 UTC 2011

Le 06/12/2011 07:20, Lex Trotman a écrit :
> [...]
> 
>> First, note that I wasn't able to find the patch, so I'm only guessing
>> from reading the thread and from my own (much less complete) attempt.
>>
> 
> I'm afraid that if I had the patch it is on my broken hard drive :-S
> 
> And anyway we never got it to work satisfactorily.
> 
>>
>> So.  This looks pretty good for line-based indents (but not brace match
>> :(), but I ran into a really annoying problem with SH:
>>
>> SH should indent after "then", "do", etc. and unindent "fi", "esac",
>> etc.  The problem is that you expect the "fi" line to be unindented
>> (e.g. use unindent_this_line), but if you type "file" for example, it'd
>> wrongly unindent that line too!
>>
>> I thought about unindenting the previous line when entering the \n, but
>> this isn't a real solution either since re-adding a newline after a well
>> indented line would unindent it again.  Crap.
>>
>> So I haven't yet found a sensible solution for this problem -- which
>> wouldn't apply for '}' since it's very unlikely it's part of a bigger
>> word -- and would like to know it anybody got super clever ideas, or how
>> other editors you know handle this.
>>
>> This said, I really like the idea of configurable indentation rules that
>> could handle languages like SH, Pascal, Ruby, Ada, etc. without the need
>> to hard-code it.
>>
> 
> WARNING, complex topic, big post :)
> 
> Quick summary of ones I know:
> 
> Emacs has language specific elisp, for C:
> 
> "It analyzes the line to determine its syntactic symbol(s) (the kind
> of language construct it's looking at) and its anchor position (the
> position earlier in the file that CC Mode will indent the line
> relative to). The anchor position might be the location of an opening
> brace in the previous line, for example. See Syntactic Analysis.
> It looks up the syntactic symbol(s) in the configuration to get the
> corresponding offset(s). The symbol +, which means “indent this line
> one more level” is a typical offset. CC Mode then applies these
> offset(s) to the anchor position, giving the indentation for the line.
> The different sorts of offsets are described in c-offsets-alist. "
> 
> And it admits that even then it gets it wrong sometimes :(
> 
> Eclipse and Netbeans also use parser results for the indent guidance.
> 
> I don't think parsing the source for indent guidance is in the Geany
> light and small spirit, so I reject that.

Right, we don't want such a thing.  Moreover it'd need one parser for
each language, something we don't (or do, if we consider
sinctilla/ctags) have and don't want to write.

> Instead I propose the following "correct most of the time" but simple
> option based on a combination of Jiri's and Emacs' methods:
> 
> 1. Each line N has an initial indentation which is the indentation of
> line N-1 plus the increments/decrements for all matches to "indent
> next line" regexes that occur in liine N-1.  (Note that each regex has
> a signed count of columns to indent/exdent)
> 
> 2. The line N final indentation is the initial indentation adjusted by
> the increments/decrements for all matches to "indent this line"
> regexes that occur in line N
> 
> Note that this is the indent, not a delta like Jiri's algorithm.  It
> is therefore stable no matter how many times it is calculated.

What do you mean here with "the indent" versus a delta?  If the new
indent's value is not count in "current indent + something * indent
size" (where here "current indent" is previous line's indent) I don't
see how this would possibly fit with configured indent sizes, nor what's
the advantage?

> The question is then when to calculate and apply this indent, clearly
> when a line is first created by enter the indent should be applied.
> 
> But what about when line content changes?  Should we:
> 
> 1. calculate the indent each change, and then ripple that through the file
> 2. calculate the indent each change and only apply it to this line
> 3. calculate and apply the indent to lines N and N-1 only on new line
> or user command
> 4. calculate and apply the indent on user command
> 
> Option 1 is rejected because it is expensive and it will destroy
> manually adjusted indentation when editing an existing line and
> because indentation can change as you type causing distracting effects
> (happens with some Emacs indentation styles)
> 
> Option 2 is rejected for the same reasons

I agree that re-indenting the whole file would just be pointless, but
not really with the other points.

OK, it may broke a manually tuned indentation, but that'd only be on
that very line and hopefully the regex would be somewhat OK.

A not-that easy (very) small improvement would be not to change the
indentation if:

1) the line has greater indent that the previous line and we would also
add indent (e.g. if either there's nothing to do or we'd do the same
thing another way), or
2) the line has smaller indent than the previous line and we also would
remove indent (just the opposite)

So then, we'd no break manual indent if it only change the width of the
indent to add/remove.  Maybe it's overcompliacted for the very small gain.

> Option 4 is rejected because auto new line indent is really the
> minimum required to be called "auto" indentation

Agreed.

> So that leaves option 3.  The upside is that new lines get a sensible
> indentation automatically, the downside is that lines that should be
> unindented won't be until enter or user command.  I have used another
> editor that worked this way and after a while I became used to it.
> Note that editing an existing line won't destroy manual indentation
> unless you tell it to or create a new line after.

I don't understand why we would change the N-1 line's indent?

> The settings are two ("indent this line", "indent next line") lists of
> pairs of a regex and a signed count.
> 
> These settings are per language so they should come from the filetype files.
> 
> A final thought, as there is now an "apply auto indent" command, if
> there is a selection the auto indent should ripple through the whole
> selection.

Well.  This does seem to look quite OK at first glance, but what's the
real difference with Jiří's (and my) solution, but using the previous
line's indent as the reference?  -- which seems quite cool since it'd
fix most of the problems I see, basically re-unindenting already
unindented lines.

Regards,
Colomban