Do you have GDPR compliance issues ?

Check out Legiscope a GDPR compliance software, that will save you weeks of work, automating your documentation, the training of your teams and all processes you need to keep your organisation compliant with privacy regulations

Crawl

Jul 20, 2023

Small, efficient web crawler with advanced features

The crawl utility starts a depth-first traversal of the web at the specified URLs. It stores all JPEG images that match the configured constraints. Crawl is fairly fast and allows for graceful termination. After terminating crawl, it is possible to restart it at exactly the same spot where it was terminated. Crawl keeps a persistent database that allows multiple crawls without revisiting sites.

The main reason for writing crawl was the lack of simple open source web crawlers. Crawl is only a few thousand lines of code and fairly easy to debug and customize.

Some of the main features

Saves encountered JPEG images
Image selection based on regular expressions and size contrainsts
Resume previous crawl after graceful termination
Persistent database of visited URLs
Very small and efficient code
Supports robots.txt

Checkout these related ports:

Zope213 - Object-based web application platform Version 2.13
Zola - Fast static site generator
Zgrab2 - Fast Go application scanner
Zerowait-httpd - Lightweight and fast http server
Zenphoto - Simpler web photo gallery
Zend-framework - Framework for developing PHP web applications
Yuicompressor - The Yahoo! JavaScript and CSS Compressor
Ytdl - YouTube downloader written in Go
Yt-dlp - Command-line program for downloading videos from various platforms
Youtube_dl - Program for downloading videos from various services
Yourls - Your Own URL Shortener
You-get - Dumb downloader that scrapes the web
Yaws - Web server for dynamic content written in Erlang
Yarr - Yet another rss reader
Yarn - Package manager for node, alternative to npm (meta port)

RECENT POSTS

Do you have GDPR compliance issues ?

Crawl

Small, efficient web crawler with advanced features

Checkout these related ports: