CrawlStrike
CrawlStrike, available here, is a high-performance, multiprocessed recursive web crawler built for reconnaissance and surface mapping. It uses a Manager-based shared state to perform lightning-fast link discovery while ensuring no URL is processed twice.
Key Features
Multiprocessed Engine: Leverages all CPU cores for simultaneous request handling.
Deep Extraction: - Parses HTML tags (
a,script,img,iframe,form).Regex-based discovery in
javascript,json,xml, andtxtfor absolute/relative paths.
Resumable Scans: Automated state saving to
.pklfiles. If a scan is interrupted withCtrl+C, you can resume exactly where you left off by starting again the script with the same parameters.Flexible Proxying: Native support for both HTTP and SOCKS5 (e.g., Tor integration).
Categorized Logging: Automatically sorts findings into
2xx.txt,3xx.txt,4xx.txt,5xx.txt, anderror.txt.
Installation
pip3 install -r requirements.txtUsage
python3 crawlstrike.py [URL] [OPTIONS]

Arguments
Argument
Description
-w, --workers
Number of parallel processes (Default: CPU count)
--proxy
HTTP/HTTPS proxy (http://127.0.0.1:8080)
If SOCKS5 proxy is defined, HTTP proxy will be disabled.
--socks
SOCKS proxy (socks5://127.0.0.1:9050)
--no-subdomains
Restrict crawl strictly to the main domain
--header
Add custom HTTP headers (Format: "Key: Value")
--follow-redirect
Follow HTTP redirects (3xx)
--output
Specify output folder (Defaults to domain name)
Examples
Output
By default the script will create an output folder named as the target URL domain:
If output folder is specified, the script will use it for the output:
Error Handling (error.txt)
error.txt)The error.txt file captures non-HTTP network failures:
Network:
ConnectError,ConnectTimeout(DNS or Firewall issues).Protocol:
SSLError,ProtocolError(Encryption/Handshake failures).Streaming:
ReadTimeout,ReadError(Interrupted data transfer).Logic:
InvalidURL,RemoteProtocolError.
Last updated