<p>Newer versions of libxml2 (used by lxml) crash in tostring() when no encoding argument is present. Passing "unicode" as encoding makes tostring() returning already a Python unicode string, so we don't need to decode it anymore.</p>
<p>On Debian Sid where libxml2 2.9.12 is included, the following error occurs without the change:</p>
<div class="snippet-clipboard-content position-relative" data-snippet-clipboard-copy-content="/usr/bin/python3 ../scripts/gen-api-gtkdoc.py xml -d . -o geany-gtkdoc.h \
                --sci-output geany-sciwrappers-gtkdoc.h
Traceback (most recent call last):
  File "/build/geany-1.37.1-1+20210903gitb7bd5fa/doc/../scripts/gen-api-gtkdoc.py", line 460, in <module>
    sys.exit(main(sys.argv))
  File "/build/geany-1.37.1-1+20210903gitb7bd5fa/doc/../scripts/gen-api-gtkdoc.py", line 389, in main
    e = DoxyStruct.from_compounddef(n0)
  File "/build/geany-1.37.1-1+20210903gitb7bd5fa/doc/../scripts/gen-api-gtkdoc.py", line 321, in from_compounddef
    e.add_member(p)
  File "/build/geany-1.37.1-1+20210903gitb7bd5fa/doc/../scripts/gen-api-gtkdoc.py", line 233, in add_member
    proc.process_element(xml.find("detaileddescription"))
  File "/build/geany-1.37.1-1+20210903gitb7bd5fa/doc/../scripts/gen-api-gtkdoc.py", line 136, in process_element
    s = self.__process_element(xml)
  File "/build/geany-1.37.1-1+20210903gitb7bd5fa/doc/../scripts/gen-api-gtkdoc.py", line 163, in __process_element
    s += self.__process_element(n) + "\n"
  File "/build/geany-1.37.1-1+20210903gitb7bd5fa/doc/../scripts/gen-api-gtkdoc.py", line 167, in __process_element
    ss = self.at.cb(n.get("kind"), self.__process_element(n))
  File "/build/geany-1.37.1-1+20210903gitb7bd5fa/doc/../scripts/gen-api-gtkdoc.py", line 163, in __process_element
    s += self.__process_element(n) + "\n"
  File "/build/geany-1.37.1-1+20210903gitb7bd5fa/doc/../scripts/gen-api-gtkdoc.py", line 170, in __process_element
    s += self.get_program_listing(n)
  File "/build/geany-1.37.1-1+20210903gitb7bd5fa/doc/../scripts/gen-api-gtkdoc.py", line 126, in get_program_listing
    arr.append("  " + tostring(etree.HTML(html), method="text").decode("utf-8"))
  File "src/lxml/etree.pyx", line 3437, in lxml.etree.tostring
  File "src/lxml/serializer.pxi", line 103, in lxml.etree._tostring
  File "src/lxml/serializer.pxi", line 75, in lxml.etree._textToString
UnicodeEncodeError: 'ascii' codec can't encode character '\xe1' in position 130970: ordinal not in range(128)
"><pre><code>/usr/bin/python3 ../scripts/gen-api-gtkdoc.py xml -d . -o geany-gtkdoc.h \
                --sci-output geany-sciwrappers-gtkdoc.h
Traceback (most recent call last):
  File "/build/geany-1.37.1-1+20210903gitb7bd5fa/doc/../scripts/gen-api-gtkdoc.py", line 460, in <module>
    sys.exit(main(sys.argv))
  File "/build/geany-1.37.1-1+20210903gitb7bd5fa/doc/../scripts/gen-api-gtkdoc.py", line 389, in main
    e = DoxyStruct.from_compounddef(n0)
  File "/build/geany-1.37.1-1+20210903gitb7bd5fa/doc/../scripts/gen-api-gtkdoc.py", line 321, in from_compounddef
    e.add_member(p)
  File "/build/geany-1.37.1-1+20210903gitb7bd5fa/doc/../scripts/gen-api-gtkdoc.py", line 233, in add_member
    proc.process_element(xml.find("detaileddescription"))
  File "/build/geany-1.37.1-1+20210903gitb7bd5fa/doc/../scripts/gen-api-gtkdoc.py", line 136, in process_element
    s = self.__process_element(xml)
  File "/build/geany-1.37.1-1+20210903gitb7bd5fa/doc/../scripts/gen-api-gtkdoc.py", line 163, in __process_element
    s += self.__process_element(n) + "\n"
  File "/build/geany-1.37.1-1+20210903gitb7bd5fa/doc/../scripts/gen-api-gtkdoc.py", line 167, in __process_element
    ss = self.at.cb(n.get("kind"), self.__process_element(n))
  File "/build/geany-1.37.1-1+20210903gitb7bd5fa/doc/../scripts/gen-api-gtkdoc.py", line 163, in __process_element
    s += self.__process_element(n) + "\n"
  File "/build/geany-1.37.1-1+20210903gitb7bd5fa/doc/../scripts/gen-api-gtkdoc.py", line 170, in __process_element
    s += self.get_program_listing(n)
  File "/build/geany-1.37.1-1+20210903gitb7bd5fa/doc/../scripts/gen-api-gtkdoc.py", line 126, in get_program_listing
    arr.append("  " + tostring(etree.HTML(html), method="text").decode("utf-8"))
  File "src/lxml/etree.pyx", line 3437, in lxml.etree.tostring
  File "src/lxml/serializer.pxi", line 103, in lxml.etree._tostring
  File "src/lxml/serializer.pxi", line 75, in lxml.etree._textToString
UnicodeEncodeError: 'ascii' codec can't encode character '\xe1' in position 130970: ordinal not in range(128)
</code></pre></div>
<p>I'm not completely sure why this happens with libxml 2.9.12 (2.9.10 works fine), the XML contents which are processed here should be plain ASCII. Anyway, it might not be bad to set the encoding anyways.</p>
<p>To reproduce, start a Docker container with a Debian Sid image, like: <code>docker run --rm -it debian:sid</code> and within the container execute:</p>
<div class="highlight highlight-source-shell position-relative" data-snippet-clipboard-copy-content="apt-get update && apt-get install --no-install-recommends -y git intltool libtool build-essential libgtk-3-dev  python3-docutils rst2pdf doxygen python3-lxml nano
git clone https://github.com/geany/geany
cd geany
./autogen.sh
make -C doc
"><pre>apt-get update <span class="pl-k">&&</span> apt-get install --no-install-recommends -y git intltool libtool build-essential libgtk-3-dev  python3-docutils rst2pdf doxygen python3-lxml nano
git clone https://github.com/geany/geany
<span class="pl-c1">cd</span> geany
./autogen.sh
make -C doc</pre></div>

<hr>

<h4>You can view, comment on, or merge this pull request online at:</h4>
<p>  <a href='https://github.com/geany/geany/pull/2885'>https://github.com/geany/geany/pull/2885</a></p>

<h4>Commit Summary</h4>
<ul>
  <li>Use "encoding" keyword argument for lxml's tostring()</li>
</ul>

<h4>File Changes</h4>
<ul>
  <li>
    <strong>M</strong>
    <a href="https://github.com/geany/geany/pull/2885/files#diff-7c43936cb8aa12a20a763c71614ec50c9f262c1c683ee12d90bf1e3caec18370">scripts/gen-api-gtkdoc.py</a>
    (6)
  </li>
</ul>

<h4>Patch Links:</h4>
<ul>
  <li><a href='https://github.com/geany/geany/pull/2885.patch'>https://github.com/geany/geany/pull/2885.patch</a></li>
  <li><a href='https://github.com/geany/geany/pull/2885.diff'>https://github.com/geany/geany/pull/2885.diff</a></li>
</ul>

<p style="font-size:small;-webkit-text-size-adjust:none;color:#666;">—<br />You are receiving this because you are subscribed to this thread.<br />Reply to this email directly, <a href="https://github.com/geany/geany/pull/2885">view it on GitHub</a>, or <a href="https://github.com/notifications/unsubscribe-auth/AAIOWJZTB5SOAA2ALLQADA3UAFENJANCNFSM5DMSHWFA">unsubscribe</a>.<br />Triage notifications on the go with GitHub Mobile for <a href="https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675">iOS</a> or <a href="https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub">Android</a>.
<img src="https://github.com/notifications/beacon/AAIOWJ4L2NXAJTPZYFGTJZDUAFENJA5CNFSM5DMSHWFKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4OXFYCNA.gif" height="1" width="1" alt="" /></p>
<script type="application/ld+json">[
{
"@context": "http://schema.org",
"@type": "EmailMessage",
"potentialAction": {
"@type": "ViewAction",
"target": "https://github.com/geany/geany/pull/2885",
"url": "https://github.com/geany/geany/pull/2885",
"name": "View Pull Request"
},
"description": "View this Pull Request on GitHub",
"publisher": {
"@type": "Organization",
"name": "GitHub",
"url": "https://github.com"
}
}
]</script>