CeWL Cheat Sheet
Spider a site and build a custom wordlist from discovered words.
Overview
CeWL (Custom Word List generator) crawls web pages and extracts words/strings for password attacks or OSINT. Ethics: only spider sites you own or have explicit permission to test; scraping third-party sites without authorization may violate terms of service and law. Do not use harvested terms to attack unrelated accounts.
Install
sudo apt install cewlcewl --helpEssential commands
cewl https://target.htb -w wordlist.txtcewl -d 2 -m 5 https://10.10.10.20 -w list.txtcewl -e --email-file emails.txt https://corp.labCommon workflows
Basic spider —
cewl https://10.10.10.20 -w cewl.txtwc -l cewl.txtsort -u cewl.txt -o cewl_unique.txtDepth and minimum word length —
cewl -d 3 -m 6 --with-numbers https://10.10.10.20/ -w deep.txtAuthentication —
cewl -u admin -p 'Password1!' --auth-type form -u https://10.10.10.20/login --user-field user --pass-field passhttps://10.10.10.20/ -w authed.txtExtract emails —
cewl -e -n https://10.10.10.20 -o emails.txtFeed hashcat / hydra —
cewl https://10.10.10.20 -m 4 -w words.txthashcat -m 1000 hashes.txt words.txthydra -l admin -P words.txt ssh://10.10.10.5Flags reference
-w | Write wordlist file |
|---|---|
-d | Spider depth |
-m | Minimum word length |
-u | Username (auth) |
-p | Password (auth) |
-e | Extract emails |
--with-numbers | Keep alphanumeric tokens |
-a | User-Agent |
-k | Keep spidering off-site links |
Tips
- Ethics: respect robots.txt policy in real assessments only when scope requires it; still need authorization to test the site at all.
- Lower -m for languages with short tokens; raise -m to reduce noise.
- Combine with crunch masks using company-specific tokens from CeWL.
- Use cewl + hashcat --stdout rules for targeted mutations, not raw rockyou on every service.