May 26, 2018

Python wrapper for HTML Tidy (tidylib)

PyTidyLib is a Python package that wraps the HTML Tidy library. This allows you, from Python code, to “fix” invalid XHTML markup. Some of the library’s many capabilities include

  • Clean up unclosed tags and unescaped characters such as ampersands
  • Output HTML 4 or XHTML, strict or transitional, and add missing doctypes
  • Convert named entities to numeric entities, which can then be used in XML documents without an HTML doctype.
  • Clean up HTML from programs such as Word to an extent
  • Indent the output, including proper i.e. no indenting for pre elements, which some XHTML indenting code overlooks.

WWW http//