Py-pypdf2

Jul 20, 2023

Pure-Python PDF toolkit

PyPdf isaA Pure-Python library built as a PDF toolkit. It is capable of

  • extracting document information title, author, …,
  • splitting documents page by page,
  • merging documents page by page,
  • cropping pages,
  • merging multiple pages into a single page,
  • encrypting and decrypting PDF files.

PyPDF2 is a versatile Python library that allows for the reading and writing of PDF files. It is available as a FreeBSD port, allowing easy installation and usage on any system running FreeBSD. In this article, we’ll guide you through how to use the py-pypdf2 port effectively, to harness it’s wide array of features.

The FreeBSD platform offers a robust ports system, which allows thousands of third-party software packages to be installed from source with ease. py-pypdf2 is one such port under the primary print category, providing a high-level interface for managing and manipulating PDF files.

Installation and Setup

To get started with py-pypdf2, we first need to install it. This can be done easily via the FreeBSD ports system. Open your FreeBSD terminal and input the following commands to fetch the port and install it

cd /usr/ports/print/py-pypdf2/ && make install clean

This command will compile and install py-pypdf2 from the ports collection. You will need root access to install ports. If you run into any issues, make sure your ports tree is up-to-date. You can update your ports tree by running the following command

portsnap fetch update

Using py-pypdf2

Let’s now dive into how to use py-pypdf2 for reading and manipulating PDF files.

Reading PDF Files

The primary class for reading PDF files in py-pypdf2 is the PdfFileReader class.

Here’s a sample code snippet to read a PDF file

from PyPDF2 import PdfFileReader

def read_pdffile_path
    with openfile_path, "rb" as file
        pdf = PdfFileReaderfile
        info = pdf.getDocumentInfo
        printf"PDF Info info"
        
read_pdf"/path/to/your/pdf"

Just replace "/path/to/your/pdf" with the actual path to your PDF file.

Merging PDF Files

py-pypdf2 makes it easy to combine multiple PDF files into a single file. The PdfFileMerger class is used for this.

from PyPDF2 import PdfFileMerger

pdfs = ["file1.pdf", "file2.pdf", "file3.pdf"]
merger = PdfFileMerger

for pdf in pdfs
    merger.appendpdf

merger.write"merged.pdf"
merger.close

This piece of code combines file1.pdf, file2.pdf, and file3.pdf into a single PDF file called merged.pdf.

Splitting PDF Files

Similarly, py-pypdf2 allows you to split a single PDF file into multiple PDF files. We use the PdfFileWriter class for this purpose.

from PyPDF2 import PdfFileWriter, PdfFileReader

def split_pdffile_path, output_path
    pdf = PdfFileReaderfile_path

    for page in rangepdf.getNumPages
        pdf_writer = PdfFileWriter
        pdf_writer.addPagepdf.getPagepage

        with openf"output_pathpage_page + 1.pdf", "wb" as output_pdf
            pdf_writer.writeoutput_pdf

split_pdf"/path/to/your/pdf", "/path/to/output/directory/"

This function splits the provided PDF file into separate PDF files for each page.

Benefit of Using py-pypdf2

py-pypdf2 is a simple yet powerful tool. It offers a high-level API to manipulate PDFs. Its functionality is not limited to just reading, writing, merging, and splitting PDF files. It also supports adding watermarks, encrypting and decrypting PDF files, and more.

Furthermore, as py-pypdf2 is open-source, it is highly flexible and customizable, and it benefits from the active contributions of the community.

Lastly, as it’s available through the FreeBSD ports collection, it’s a breeze to install on any FreeBSD system.

In Conclusion

While py-pypdf2 offers a wide array of features for manipulating PDF files, we have only scratched the surface. We encourage you to explore the excellent [official documentation]https//pythonhosted.org/PyPDF2/.

Remember, FreeBSD is not just about serving web requests or providing a development environment. It’s also a fantastic platform for tasks like document processing. Explore more ports at the [FreeBSD ports collection]https//freebsdsoftware.org/. Other ports you might find interesting include the likes of [pngquant]https//freebsdsoftware.org/graphics/pngquant.html for image processing and [nmap]https//freebsdsoftware.org/security/nmap.html for network mapping and security auditing.

We hope you found this guide useful and that it helps you get started with py-pypdf2 on FreeBSD. Happy PDF processing!


Checkout these related ports:
  • Yatex - Yet Another LaTeX mode and html mode on Emacs
  • Xtexsh - Tcl/Tk-based simple TeX interface
  • Xreader - Multi-format document reader
  • Xpp - X11-based printer manager for CUPS
  • Xpdfopen - Command line utility for PDF viewers
  • Xmbibtex - Reference manager based on the BibTeX file format
  • Xfce4-print - Print system support for the Xfce Desktop
  • Utopia - Adobe Utopia typeface for Groff
  • Typetools - Tools for manipulating fonts
  • Txtbdf2ps - Translator TXT + BDF to PS
  • Ttfquery - FontTools-based package for querying system fonts
  • Ttfautohint - Automatic font hinting library
  • Ttf2pt1 - True Type Font to Postscript Type 1 Converter
  • Trueprint - Print program listings on postscript printer
  • Transfig - Tools to convert Xfig .fig files