Re: [Geany-devel] [CODE REVIEW] Changes to encodings

17 May 2011

      ...
IMO, it's far superior to the current way where it opens the file but chops
off everything after the first NUL character.  At least with the changes,
most things will work and the whole text file can be loaded.  I think this
will cause less bug reports than there are currently on this NUL truncation
bug.
I agree its better to load the file, but I don't know that it will get
less bug reports if I know users (being one) :-)
My main concern with all these comments is to get a situation where
there are as few surprises as possible, surprises are what confuses
and upsets users.
...
...
...
...

editor.c and others including plugins use sci_add_text a number of

times, will any of that break? If so it needs to be disabled or fixed.
Using the old function will just lop off the end of a string with
embedded
NULs because it uses strlen(), but it will still add the truncated text
fine.
Well truncated isn't right, so its not fine from a user standpoint.
Agreed that truncation isn't right, but it's how Geany is currently handling
everything with NUL characters, my changes only improve this situation to
some degree.
And I might add that even though we see it as an improvement and are
thankful, users just take it as a given and go on to the next issue.
Sigh...
...
...
...
...

do searches work with files containing NULs?

Seem to work as expected.
Hmmm, wonder how, the regex engine takes a NUL terminated string.  I
would expect the Scintilla searches to work, but not regexes (well not
past the first NUL anyway), so maybe they need disabling or need to be
made to continue past the NUL.  Since one of the main reasons put
forward in the bugs you listed for opening files with NULs is lightly
corrupted log files, being able to search them is important.
Just tested searching with a regex, it works up until the first NUL
character.  I'll have to see if it's something in the regex implementation
or an assumption in Geany that NULs don't exist.  The regex search in SciTE
works fine in this case.
Scite works because it uses Scintilla's built in but limited regex
engine, we use the superior system or GNU regex engine, but it only
takes a NUL terminated string to search.
I don't think this one is fixable, lets put it on the list of things
that don't work to be documented.  Search is limited to text only.
...
...
...
...

How do selections work with embedded NULs, AFAICT all the selection

calls use NUL termination?
Selecting with the mouse/keyboard works the same.  As far as selecting
programatically, assuming the Scintilla wrappers are more or less
directly
wrapping the Scintilla messages, there shouldn't be a problem.  I'll
check
these out shortly, but at glance, they look fine.
But what we select gets truncated when pasted.
Just tested this, and when pasting text selected/copied with embedded NUL,
it does truncate at the first NUL.  Again, I'll need to see if this is
something can be fixed from the Geany side.  SciTE behaves the same way
currently.  In theory, we know where the selection starts and ends, so it
should be possible copy the text and know the length.
No, cut/paste happens in Scintilla we can't do anything about it, but
my guess would be that it uses the GTK clipboard, and that only takes
NUL terminated strings.
Again I think, document and leave.
...
...
...
...

All the places sci_get_string, sci_get_line, sci_get_text,

sci_get_contents and any others that get/set text using NUL
termination need to be disabled or fixed.
The sci_get_string() function properly finds out the size of the string
using a Scintilla message and then ensuring a final NUL-terminator, so it
should work fine.  Any code calling these functions will either truncate
the
returned string at the first NUL or handle it completely. Same goes for
the
other sci_get_*() functions.
Yes, but what does the code that used the call do, it has to stop at
the first NUL in the file since it doesn't have any other length info.
 Therefore it likely doesn't do all it should.
Agreed, but still an improvement over the current handling.  Functions like
sci_get_contents() and sci_get_text() require a length parameter, so the
calling code does have length info already.
So long as truncation doesn't do something destructive.
...
...
...
...

All of the above goes for plugins too.

Aside from the issues identified with the API breakage, the worst case
scenario should be truncation of the string I think.
Which again means that it is likely to not work right.
In the same way as Geany, probably yes.
So long as they don't do something unexpectedly destructive or crash.
You've identified the VCS plugin already.
...
...
...
...

Do lexers work with embedded NULs or does such a file have to be

filetype None, if so it needs to be enforced.
At least the CPP lexer works fine, others to be tested, though I don't
think
they rely too much (or at all) on NUL termination.
On further thought I guess if they work with Scite they will be ok.
SciTE actually handles text files with NULs quite nicely, it doesn't even
mark them as read-only or anything and it saves them no problems.
Well the main reason for the read-only is the truncation, so you don't
write a short file over the long one.  Now you've fixed that, but I
still think its worth leaving it read-only.
...
...
...
Probably a good idea would be to improve the warning dialog that pops up
when you open a file with embedded NULs that explains that some functions
may not work properly.
Yes, I think that there would definitely need to be some warning.
That means that you have to check the file for NUL, can this be coded
in your encoding coding? :-D  So the file doesn't need scanning again.
It's done and working[1] in a similar fashion as before (sans the
truncation), where it compares the full data size with strlen() of the data
and if they're different, passes read-only back as true, which causes the
warning dialog to show and the doc to be made read-only.
Ok.
...
...
If we let the user edit the file, the main thing that must work is
finding and deleting the NULs so the user can get the file back to
being a text file and the search capabilities.
Already works fine, you just turn read-only off, select the NUL character
and delete it.  One note about "get the file back to being a text file" is
that a file with embedded NULs is actually a perfectly valid text file.
Agree, I meant C's definition of "text file".
The
...
problem in Geany, is that it assumes that a text file is the same thing as a
C string.  Any file that contains all valid characters in it's encoding (of
which NUL is one of), is a valid text file.
Agree.
Cheers
Lex

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

Re: [Geany-devel] [CODE REVIEW] Changes to encodings