Py-pyocr

Jul 20, 2023

Python wrapper for OCR engines (Tesseract, Cuneiform, etc)

PyOCR is an optical character recognition OCR tool wrapper for python. That is, it helps using various OCR tools from a Python program.

It has been tested only on GNU/Linux systems. It should also work on similar systems *BSD, etc. It may or may not work on Windows, MacOSX, etc.

Supported OCR tools

  • Libtesseract Python bindings for the C API
  • Tesseract wrapper fork + exec
  • Cuneiform wrapper fork + exec

Features

  • Supports all the image formats supported by Pillow, including jpeg, png, gif, bmp, tiff and others
  • Various output types text only, bounding boxes, etc.
  • Orientation detection Tesseract and libtesseract only
  • Can focus on digits only Tesseract and libtesseract only
  • Can save and reload boxes in hOCR format
  • PDF generation libtesseract only


Checkout these related ports:
  • Zphoto - Zooming photo album generator
  • Zint - Barcode generator (library and utilities)
  • Zimg - Image-generator that uses ASCII input files to create PNGs/EDFs
  • Zgv - Graphics viewer for SVGAlib
  • Zbar - ZBar barcode reader
  • Zathura - Customizable lightweight pdf viewer
  • Zathura-ps - PostScript support for Zathura PDF viewer
  • Zathura-pdf-poppler - Poppler render PDF plugin for Zathura PDF viewer
  • Zathura-pdf-mupdf - MuPDF render PDF plugin for Zathura PDF viewer
  • Zathura-djvu - DjVu support for zathura
  • Zathura-cb - Comic book plugin for Zathura PDF viewer
  • Yukon - Real-time capture tool for OpenGL applications
  • Yed - Editor for graphs and diagrams
  • Yafaray - Montecarlo raytracing engine
  • Yacreader - Yet another comic reader