P5-text-language-guess

Jul 20, 2023

Trained module to guess a document’s language

TextLanguageGuess guesses a document’s language. Its implementation is simple Using “TextExtractWords” and “LinguaStopWords” from CPAN, it determines how many of the known stopwords the document contains for each language supported by “LinguaStopWords”.

Each word in the document recognized as stopword of a particular language scores one point for this language.

The “language_guess” function takes a document as a parameter and returns the abbreviation of the language that it is most likely written in.



Checkout these related ports:
  • Zxing-cpp - ZXing C++ Library for QR code recognition
  • Zu-hunspell - Zulu hunspell dictionaries
  • Zu-aspell - Aspell Zulu dictionary
  • Zq - Easier and faster alternative to jq
  • Zorba - General purpose C++ XQuery processor
  • Zenxml - Simple C++ XML Processing
  • Zed - Command-line tool to manage and query Zed data lakes
  • Yq - Command-line YAML and XML processor, jq wrapper for YAML/XML documents
  • Yould - Pronounceable word generator
  • Yodl - Easy to use but powerful document formatting/preparation language
  • Yi-hunspell - Yiddish hunspell dictionaries
  • Yi-aspell - Aspell Yiddish dictionary
  • Yelp-xsl - DocBook XSLT stylesheets for yelp
  • Yelp-tools - Utilities to help manage documentation for Yelp and the web
  • Ydiff - Diff readability enhancer for color terminals