Processing in a string html and xml
You want the HTML perhaps XML Entities such as &entity; or &#code; Replace with the corresponding text . also , You need to convert specific characters in the text ( such as <, >, or &).
If you want to replace... In a text string ‘<’ perhaps ‘>’ , Use html.escape() Functions can be done easily . such as ：
'Elements are written as "<tag>text</tag>".' import html print(s) Elements are written as "<tag>text</tag>". print(html.escape(s)) Elements are written as "<tag>text</tag>". # Disable escaping of quotes print(html.escape(s, quote=False)) Elements are written as "<tag>text</tag>". >>>s =
If you're dealing with ASCII Text , And want to make the wrong ASCII The encoding entity corresponding to the text is embedded , You can give some I/O Function transfer parameters errors='xmlcharrefreplace' To achieve this goal . such as ：
'Spicy Jalapeño' > s.encode('ascii', errors='xmlcharrefreplace') b'Spicy Jalapeño' >> s =
In order to replace the encoding entity in the text , You need to use another method . If you're dealing with HTML perhaps XML Text , Try to use a suitable one first HTML perhaps XML Parser . Usually , These tools will automatically replace these encoded values , You don't have to worry about .
occasionally , If you receive some raw text with encoded values , It needs to be replaced manually , You usually just use HTML perhaps XML Some related tool functions of parser / The method can . such as ：
'Spicy "Jalapeño".' > from html.parser import HTMLParser > p = HTMLParser() > p.unescape(s) 'Spicy "Jalapeño".' > >>> t = 'The prompt is >>>' > from xml.sax.saxutils import unescape > unescape(t) 'The prompt is >>>' >> s =
It's generating HTML perhaps XML In the text , If the correct conversion of special marker characters is a detail that can easily be ignored . Especially when you use print() Function or other string formatting to produce output . Use something like html.escape() The tool function can easily solve this kind of problem .
If you want to deal with text in other ways , There are also some other utility functions, such as xml.sax.saxutils.unescapge() Can help you . However , You should first investigate how to use a proper parser . such as , If you're dealing with HTML or XML Text , Use some parsing module, such as html.parse or xml.etree.ElementTree It has automatically processed the replacement details for you .