P5-html-gumbo

Jul 20, 2023

HTML5 parser based on gumbo C library

Gumbo is an implementation of the HTML5 parsing algorithm implemented as a pure C99 library with no outside dependencies.

Goals and features of the C library

  • Fully conformant with the HTML5 spec.
  • Robust and resilient to bad input.
  • Simple API that can be easily wrapped by other languages. This is one of such wrappers.
  • Support for source locations and pointers back to the original text. Not exposed by this implementation at the moment.
  • Relatively lightweight, with no outside dependencies.
  • Passes all html5lib-0.95 tests.
  • Tested on over 2.5 billion pages from Google’s index.

See also https//github.com/ruz/HTML-Gumbo



Checkout these related ports:
  • Zope213 - Object-based web application platform Version 2.13
  • Zola - Fast static site generator
  • Zgrab2 - Fast Go application scanner
  • Zerowait-httpd - Lightweight and fast http server
  • Zenphoto - Simpler web photo gallery
  • Zend-framework - Framework for developing PHP web applications
  • Yuicompressor - The Yahoo! JavaScript and CSS Compressor
  • Ytdl - YouTube downloader written in Go
  • Yt-dlp - Command-line program for downloading videos from various platforms
  • Youtube_dl - Program for downloading videos from various services
  • Yourls - Your Own URL Shortener
  • You-get - Dumb downloader that scrapes the web
  • Yaws - Web server for dynamic content written in Erlang
  • Yarr - Yet another rss reader
  • Yarn - Package manager for node, alternative to npm (meta port)