Html2text

Jul 20, 2023

Converts HTML documents into plain text

html2text is a command line utility, written in C++, that converts HTML documents HTML 3.2 into plain text ISO 8859-1.

Each HTML document is loaded from a location indicated by an URI or read from standard input, and formatted into a stream of plain text characters that is written to standard output or into an output-file. The input-URI may specify a remote site, from that the documents are loaded with the Hypertext Transfer Protocol HTTP. The program is even able to preserve the original positions of table fields and accepts also syntactically incorrect input, attempting to interpret it “reasonably”. The rendering is largely customisable through an RC file.



Checkout these related ports:
  • Zxing-cpp - ZXing C++ Library for QR code recognition
  • Zu-hunspell - Zulu hunspell dictionaries
  • Zu-aspell - Aspell Zulu dictionary
  • Zq - Easier and faster alternative to jq
  • Zorba - General purpose C++ XQuery processor
  • Zenxml - Simple C++ XML Processing
  • Zed - Command-line tool to manage and query Zed data lakes
  • Yq - Command-line YAML and XML processor, jq wrapper for YAML/XML documents
  • Yould - Pronounceable word generator
  • Yodl - Easy to use but powerful document formatting/preparation language
  • Yi-hunspell - Yiddish hunspell dictionaries
  • Yi-aspell - Aspell Yiddish dictionary
  • Yelp-xsl - DocBook XSLT stylesheets for yelp
  • Yelp-tools - Utilities to help manage documentation for Yelp and the web
  • Ydiff - Diff readability enhancer for color terminals