Jul 20, 2023

Character encoding aliases for legacy web content

In order to be compatible with legacy web content when interpreting something like Content-Type text/html; charset=latin1, tools need to use a particular set of aliases for encoding labels as well as some overriding rules.

For example, US-ASCII and iso-8859-1 on the web are actually aliases for windows-1252, and an UTF-8 or UTF-16 BOM takes precedence over any other encoding declaration.

The Encoding standard defines all such details so that implementations do not have to reverse-engineer each other.

This module has encoding labels and BOM detection, but the actual implementation for encoders and decoders is Python’s.

Checkout these related ports:
  • Zbase32 - Base32 Encoder/Decoder
  • Ytnef - Unpack data in MS Outlook TNEF format
  • Yj - Convert between YAML, TOML, JSON, and HCL
  • Yj-bruceadams - Command line tool that converts YAML to JSON
  • Xml2c - Convert an XML file into C struct/string declarations
  • Xdeview - X11 program for uu/xx/Base64/BinHex/yEnc de-/encoding
  • Wkhtmltopdf - Convert HTML (or live webpages) to PDF or image
  • Uulib - Library for uu/xx/Base64/BinHex/yEnc de-/encoding
  • Uudeview - Program for uu/xx/Base64/BinHex/yEnc de-/encoding
  • Unix2dos - Convert ASCII newlines between CR/LF and LF
  • Tuc - Text to Unix Conversion
  • Trans - Character encoding converter generator
  • Tnef - Unpack data in MS Outlook TNEF format
  • Ta2as - TASM to AT&T asm syntax converter (GNU AS)
  • Showkey - Display cooked key sequences (keycap-to-keystrokes mappings)