p5-HTML-ExtractContent
0.12_1Perl extension for HTML content extractor with scoring heuristics
HTML::ExtractContent is a module for extracting content from HTML with scoring heuristics. It guesses which block of HTML looks like content according to scores depending on the amount of punctuation marks and the lengths of non-tag texts. It also guesses whether content end in the block or continue to the next block.
Origin: www/p5-HTML-ExtractContent
Category: www
Size: 50.3KiB
License: ART10, GPLv1+
Maintainer: perl@FreeBSD.org
Dependencies: 4 packages
Required by: 0 packages
$
pkg install p5-HTML-ExtractContentDependencies (4)
More in www
py311-requests2.32.5
Python HTTP for Humansp5-libwww6.81
Perl5 library for WWW accessp5-HTML-Parser3.83
Perl5 module for parsing HTML documentsphp84-session8.4.16
The session shared extension for phpp5-Catalyst-Runtime5.90132_1
Elegant MVC Web Application Framework (Runtime)py311-django424.2.29
High-level Python Web Frameworkapache242.4.66
Version 2.4.x of Apache web serverp5-HTTP-Message7.01
Representation of HTTP style messagesp5-Template-Toolkit3.102
Extensible template processing systemp5-Plack1.0051
Perl extension of PSGI reference implementation and utilities