[geany/geany] Cannot open uncompressed PDF file (Issue #3677)

List overview All Threads

newer

older

[geany/geany-themes] DarkSpook...

[geany/geany] Remove deprecated...

abdulbadii

5 Nov 2023 5 Nov '23

6:54 p.m.

Need at editing an uncompressed PDF file but suddenly geany seems simply as ignoring is completely i.e. cannot open that uncompressed PDF file Anyone knowledgeable what actually is going on such ?

-- Reply to this email directly or view it on GitHub: https://github.com/geany/geany/issues/3677 You are receiving this because you are subscribed to this thread. Message ID: geany/geany/issues/3677@github.com

Attachments:

attachment.html (text/html — 1.5 KB)

Show replies by date

Jan Dolinár

5 Nov 5 Nov

7:12 p.m.

My guess would be a problem with encoding. Did you check the status tab in the bottom panel? It should tell you if there is some issue with the file being opened.

-- Reply to this email directly or view it on GitHub: https://github.com/geany/geany/issues/3677#issuecomment-1793807502 You are receiving this because you are subscribed to this thread. Message ID: geany/geany/issues/3677/1793807502@github.com

abdulbadii

7:30 p.m.

There's message: "File 'some.PDF' does not look like a text file or the file encoding not supported"

I thought till now geany is capable of opening all code pages

Any viable workaround ?

-- Reply to this email directly or view it on GitHub: https://github.com/geany/geany/issues/3677#issuecomment-1793811296 You are receiving this because you are subscribed to this thread. Message ID: geany/geany/issues/3677/1793811296@github.com

elextr

6 Nov 6 Nov

12:55 a.m.

Some points:

1. Geany will not load any file which contains NUL bytes after converting the encoding to UTF-8 2. There is no guaranteed way to detect encoding of a file 3. PDFs can contain image data which can contain NUL bytes and is not text that can be converted to UTF-8 4. The message "File 'some.PDF' does not look like a text file or the file encoding not supported" means one o fthe above occurred for every encoding Geany knows.

-- Reply to this email directly or view it on GitHub: https://github.com/geany/geany/issues/3677#issuecomment-1793887808 You are receiving this because you are subscribed to this thread. Message ID: geany/geany/issues/3677/1793887808@github.com

Colomban Wendling

2:58 p.m.

...

Any viable workaround ?

Viable I don't know, but I heard some people working around NUL bytes by replacing those with a placeholder and back after edition. Something like `sed -i 's/\0/%%NUL%%/g' file && geany file && sed -i 's/%%NUL%%/\x00/g/'` or along the idea. Something like that should work, but I never used ir myself.

-- Reply to this email directly or view it on GitHub: https://github.com/geany/geany/issues/3677#issuecomment-1794883790 You are receiving this because you are subscribed to this thread. Message ID: geany/geany/issues/3677/1794883790@github.com

elextr

7 Nov 7 Nov

1:37 a.m.

Still the image data may not be convertible to UTF-8, its just a sequence of bytes, not any encoding.

-- Reply to this email directly or view it on GitHub: https://github.com/geany/geany/issues/3677#issuecomment-1797086716 You are receiving this because you are subscribed to this thread. Message ID: geany/geany/issues/3677/1797086716@github.com

Thomas Martitz

7:22 a.m.

base64 then, the solution to just about anything?

-- Reply to this email directly or view it on GitHub: https://github.com/geany/geany/issues/3677#issuecomment-1797890095 You are receiving this because you are subscribed to this thread. Message ID: geany/geany/issues/3677/1797890095@github.com

elextr

7:36 a.m.

...

base64 then, the solution to just about anything?

:-)

Well, base 64 would be a good solution, if iconv encoding converters recognised the image data inside the PDF and converted it to base 64 instead of just crapping out when the random bytes in the image do not make a valid encoding. (Image or any other binary that can be embedded in PDF).

If base 64 or any other no-NULs format is allowed in PDFs maybe the OP could use one of the PDF converters to do so before editing it in Geany.

But probably the best answer is for the OP to use an actual PDF editor, not try to edit PDF content in a text editor.

-- Reply to this email directly or view it on GitHub: https://github.com/geany/geany/issues/3677#issuecomment-1797900923 You are receiving this because you are subscribed to this thread. Message ID: geany/geany/issues/3677/1797900923@github.com

Colomban Wendling

3:41 p.m.

...

Still the image data may not be convertible to UTF-8, its just a sequence of bytes, not any encoding.

UTF-8 no, but many encodings are actually "a sequence of bytes", which is the reason why choosing the right one when opening is so hard (basically, it's a guessing game if you don't know already the answer). *Any* stream of bytes its gonna be valid in one of the ISO-8859-* encodings, the only reason we don't accept them is that we refuse NUL bytes.

-- Reply to this email directly or view it on GitHub: https://github.com/geany/geany/issues/3677#issuecomment-1798686612 You are receiving this because you are subscribed to this thread. Message ID: geany/geany/issues/3677/1798686612@github.com

elextr

3:52 p.m.

Ok, ... convertable to UTF-8 with no NULs ...

Picky picky mumble mumble... :-)

-- Reply to this email directly or view it on GitHub: https://github.com/geany/geany/issues/3677#issuecomment-1798756647 You are receiving this because you are subscribed to this thread. Message ID: geany/geany/issues/3677/1798756647@github.com

Colomban Wendling

3:54 p.m.

Yes, any ISO-8859-* stream with no NULs is convertible to UTF-8 with no NULs. Or am I missing something? I don't think any of ISO-8859-* has things Unicode cannot represent :)

-- Reply to this email directly or view it on GitHub: https://github.com/geany/geany/issues/3677#issuecomment-1798767995 You are receiving this because you are subscribed to this thread. Message ID: geany/geany/issues/3677/1798767995@github.com

elextr

3:58 p.m.

Yes, but image data is very likely to have NUL bytes which those encodings convert to NULs IIRC. My point is that the text in the PDF will be convertable by some encoding just fine, but embedded images run through the same converter will get garbage and likely NULs. The iconv converters don't know about embedded images to skip them.

-- Reply to this email directly or view it on GitHub: https://github.com/geany/geany/issues/3677#issuecomment-1798777828 You are receiving this because you are subscribed to this thread. Message ID: geany/geany/issues/3677/1798777828@github.com

Colomban Wendling

4:06 p.m.

Sure, it won't be a convenient experience, and if the non-binary data is not single-byte encoded it's unlikely to be really usable as even stripping/replacing the NULs will not allow to convert to that multi-byte encoding if it has stricter rules than "any byte goes anywhere" (like UTF-8).

Anyway, any solution for editing binary data is gonna be sub-optimal if not specialized for that type of data. Geany knows binary data that represent text, anything else it doesn't. Even real hex editors are usually a pain if they don't have specific support for the format -- but still, they permit to do *some* useful things sometimes.

-- Reply to this email directly or view it on GitHub: https://github.com/geany/geany/issues/3677#issuecomment-1798798395 You are receiving this because you are subscribed to this thread. Message ID: geany/geany/issues/3677/1798798395@github.com

elextr

4:16 p.m.

So to summarise, for PDF files use a PDF editor, for image files use an image editor, for pure text files use Geany.

-- Reply to this email directly or view it on GitHub: https://github.com/geany/geany/issues/3677#issuecomment-1798845230 You are receiving this because you are subscribed to this thread. Message ID: geany/geany/issues/3677/1798845230@github.com

Enrico Tröger

20 Dec 20 Dec

1:51 p.m.

I propose to close this, it's highly unlikely Geany will ever become a PDF editor. Or is there any reason in keeping this open?

-- Reply to this email directly or view it on GitHub: https://github.com/geany/geany/issues/3677#issuecomment-1864422905 You are receiving this because you are subscribed to this thread. Message ID: geany/geany/issues/3677/1864422905@github.com

Colomban Wendling

2:02 p.m.

Yeah it's probably fine to close.

FWIW, I have code on top of my encoding PRs for opening binary files, but that's limited to the loading and encoding management, not adapting all code to work with NULs. Yet, search seems to work fairly well @elextr 😉 Anyway, I'm not sure we're gonna merge it, as it's not 100% trivial and doesn't necessarily add a lot of value if most features are half-broken -- however it could help with viewing broken log files or fixing a tiny corruption.

-- Reply to this email directly or view it on GitHub: https://github.com/geany/geany/issues/3677#issuecomment-1864436760 You are receiving this because you are subscribed to this thread. Message ID: geany/geany/issues/3677/1864436760@github.com

445

Age (days ago)

490

Last active (days ago)

github-comments@lists.geany.org

15 comments

6 participants

tags (0)

participants (6)

abdulbadii
Colomban Wendling
elextr
Enrico Tröger
Jan Dolinár
Thomas Martitz