Colomban, Matthew,
After our discussion on IRC I had a bit more of a thought about it and got to the following position. It isn't much different from where I think we were, but more precisely described (well in my brain anyway).
NOTE: be careful reading new line and newline, the latter is a character \n.
Auto indentation involves several parts
a. creating the indentation of new lines when return is pressed b. adjusting the indentation of existing lines based on content typed into it c. not screwing up existing indentation based on editing the contents of a line (thats something we didn't discuss much)
I think part a is self explanatory, for most languages the previous line contents determine if the new line is the same indent or indented more or less (though some languages/layouts must look back further).
Part b covers things like unindenting case: in a switch, unindenting } or unindenting "end" after a Pascal block, and so it can apply to the current line or the previous line (because we pressed newline after the case: } or end so it is now the previous line).
Part c means whatever the triggers for part b are they don't (often) screw you up when you are editing corrections. As an example if I had a case: and accidentally deleted the : then typed it again I wouldn't want the case to unindent a second time. This probably means that we can't just simply increment/decrement the indentation of any line, rather we have to set it +/- relative to the indentation of the previous line. This means we can't just keep a counter of indent level and increment/decrement it because we don't know if that has already been done. Applying the increment to the level of the previous line makes it stable, but is slightly more work.
We can always wait for newline before adjusting what is now the previous line or we can trigger adjustment before the newline, triggered by things like the : or }. I think emacs left it until newline until recently, and it introduces less problems but can be visually annoying. The case of "end" would need to trigger on something rather complicated to stop a line like ending = 1 from unindenting, so always leave it to newline.
Most languages are usually laid out in relatively simple indentation schemes and these decisions can be made on the basis of a number of sets of trigger character/regex pairs, but there needs to be a way of augmenting or replacing this with specific code for some languages (eg Haskell).
So the settings for a default and fairly useful (but not universal) indenter are:
1. Triggered by newline:
a. set the indent of the line before the newline to one level less than the one before it, regex to be applied to the line before the newline
b. set the indent of the new line to one greater than the one before the newline if a regex to be applied to the line before the newline
2. Set the indent of this line to be greater than the previous line by one level, a list of trigger character/regex pairs, when the trigger character is typed the regex is tried on this line, match sets the indent
3. Set the indent of this line to be less than the previous line by one level, list of trigger character/regex pairs, when the trigger character is typed the regex is tried on this line, match sets the indent
Separate (existing) code is required for brace matching (and could be made even more useful if ()[] worked as well).
The augmentation just needs to be provided by a registered function which is triggered by a set of characters. If the default indenter was also triggered then the registered function would be called after the default indenter has finished. Given the potential complexity of some languages (Lisp, Haskell come to mind) the whole buffer needs to be available to them.
Of course all of this is per filetype, and only the settings of the filetype of the current document would be checked.
Cheers Lex