Convert Python ElementTree To String


Answer :

Element objects have no .getroot() method. Drop that call, and the .tostring() call works:


xmlstr = ElementTree.tostring(et, encoding='utf8', method='xml')

You only need to use .getroot() if you have an ElementTree instance.


Other notes:



  • This produces a bytestring, which in Python 3 is the bytes type.

    If you must have a str object, you have two options:



    1. Decode the resulting bytes value, from UTF-8: xmlstr.decode("utf8")



    2. Use encoding='unicode'; this avoids an encode / decode cycle:


      xmlstr = ElementTree.tostring(et, encoding='unicode', method='xml')




  • If you wanted the UTF-8 encoded bytestring value or are using Python 2, take into account that ElementTree doesn't properly detect utf8 as the standard XML encoding, so it'll add a <?xml version='1.0' encoding='utf8'?> declaration. Use utf-8 or UTF-8 (with a dash) if you want to prevent this. When using encoding="unicode" no declaration header is added.





How do I convert ElementTree.Element to a String?


For Python 3:


xml_str = ElementTree.tostring(xml, encoding='unicode')

For Python 2:


xml_str = ElementTree.tostring(xml, encoding='utf-8')

The following is compatible with both Python 2 & 3, but only works for Latin characters:


xml_str = ElementTree.tostring(xml).decode()



Example usage


from xml.etree import ElementTree

xml = ElementTree.Element("Person", Name="John")
xml_str = ElementTree.tostring(xml).decode()
print(xml_str)

Output:


<Person Name="John" />



Explanation


Despite what the name implies, ElementTree.tostring() returns a bytestring by default in Python 2 & 3. This is an issue in Python 3, which uses Unicode for strings.



In Python 2 you could use the str type for both text and binary data.
Unfortunately this confluence of two different concepts could lead to
brittle code which sometimes worked for either kind of data, sometimes
not. [...]


To make the distinction between text and binary data clearer and more pronounced, [Python 3] made text and binary data distinct types that cannot blindly be mixed together.



Source: Porting Python 2 Code to Python 3


If we know what version of Python is being used, we can specify the encoding as unicode or utf-8. Otherwise, if we need compatibility with both Python 2 & 3, we can use decode() to convert into the correct type.


For reference, I've included a comparison of .tostring() results between Python 2 and Python 3.


ElementTree.tostring(xml)
# Python 3: b'<Person Name="John" />'
# Python 2: <Person Name="John" />

ElementTree.tostring(xml, encoding='unicode')
# Python 3: <Person Name="John" />
# Python 2: LookupError: unknown encoding: unicode

ElementTree.tostring(xml, encoding='utf-8')
# Python 3: b'<Person Name="John" />'
# Python 2: <Person Name="John" />

ElementTree.tostring(xml).decode()
# Python 3: <Person Name="John" />
# Python 2: <Person Name="John" />

Thanks to Martijn Peters for pointing out that the str datatype changed between Python 2 and 3.




Why not use str()?


In most scenarios, using str() would be the "cannonical" way to convert an object to a string. Unfortunately, using this with Element returns the object's location in memory as a hexstring, rather than a string representation of the object's data.


from xml.etree import ElementTree

xml = ElementTree.Element("Person", Name="John")
print(str(xml)) # <Element 'Person' at 0x00497A80>


Comments

Popular posts from this blog

Converting A String To Int In Groovy

"Cannot Create Cache Directory /home//.composer/cache/repo/https---packagist.org/, Or Directory Is Not Writable. Proceeding Without Cache"

Android SDK Location Should Not Contain Whitespace, As This Cause Problems With NDK Tools