05. Reconnaissance & OSINT
Know Your Target Before You Strike
Professional penetration testers spend 60–80% of their engagement time on reconnaissance. The more you know about your target before touching it, the more precise and undetectable your attack will be. Attackers who skip recon make noise, leave traces, and miss obvious vectors. Attackers who invest in recon walk through front doors that everyone else missed.
Reconnaissance divides into two categories: Passive (no direct contact with the target—completely invisible) and Active (direct interaction with target systems—detectable but more precise). This module covers both in depth, with real tools and real techniques.
👁️ Passive Reconnaissance — Stay Completely Invisible
Passive recon gathers intelligence from publicly available sources without ever touching the target's infrastructure. Done correctly, the target has no idea you're watching and no logs capture your activity.
WHOIS Lookups:
WHOIS queries the registration database for any domain. It reveals: registrant name and organization, registrant email address (often an IT admin's direct email—perfect for spear-phishing), registration and expiration dates, name servers (revealing DNS infrastructure), and sometimes the registrant's phone number. Run: whois target.com from the command line or use who.is online.
Google Dorking — Google as a Vulnerability Scanner:
Google's advanced search operators, known as Google Dorks, can find sensitive information that organizations accidentally exposed to search engines. These are real queries used by attackers and penetration testers every day:
site:target.com filetype:pdf— Find all indexed PDFs. May contain org charts, technical specs, internal procedures.site:target.com inurl:admin OR inurl:login OR inurl:dashboard— Find administrative panels.intitle:"index of" site:target.com— Find open directory listings exposing file structures.site:target.com ext:sql OR ext:bak OR ext:env OR ext:conf— Find accidentally exposed database dumps, backups, and config files. These sometimes contain plaintext credentials.site:target.com intext:"password" filetype:txt— Find text files containing the word password."@target.com" filetype:xlsx OR filetype:csv— Find spreadsheets containing email addresses—employee lists for phishing campaigns.site:github.com "target.com" password OR secret OR api_key— Find leaked credentials in public GitHub repositories. Developers frequently accidentally commit API keys and passwords.
Shodan — The Internet-Wide Device Scanner:
Shodan is a search engine that continuously scans the entire internet and indexes everything it finds. Unlike Google, which indexes web pages, Shodan indexes banners from open network services—SSH banners, HTTP headers, FTP banners, database responses. It is the most powerful passive recon tool available.
org:"Target Corporation"— Find all internet-facing assets belonging to an organization.port:3306 org:"Target Corp"— Find exposed MySQL servers belonging to the target.ssl:"target.com" 200— Find all web servers using SSL certificates with the target's domain in them—reveals previously unknown subdomains and applications.http.title:"Target Admin"— Find pages with "Target Admin" in the title—finds admin panels, router management interfaces, and internal tools accidentally exposed to the internet.- Shodan also reveals software versions on every discovered service. You can immediately cross-reference against CVE databases to find unpatched vulnerabilities at internet scale.
Censys — Certificate Intelligence:
Censys excels at TLS certificate transparency data. Certificates issued for a domain are publicly logged (Certificate Transparency logs), and Censys indexes all of them. This is one of the best ways to find subdomains that are not listed in DNS and not indexed by Google. A company might have 20 public subdomains everyone knows about and 80 internal development/staging subdomains they forgot to restrict—all visible through certificate logs.
LinkedIn and Social Media — Building the Human Attack Surface:
LinkedIn tells you: every employee's name and title, which technologies the company uses (engineers list their tech stacks), which vendors and partners the company works with, and organizational structure (who reports to whom). This intelligence is used to: identify IT administrators for spear-phishing, understand what software and versions are deployed internally, find contractors and third parties with internal access, craft convincing pretexting scenarios for social engineering.
The Wayback Machine (web.archive.org):
Historical website snapshots reveal things that may no longer be visible but still exist on the server: old admin portals from previous versions of the site, backup files referenced in old pages (.zip, .tar.gz of the entire site), deprecated API endpoints that still function but are no longer linked, API keys and credentials accidentally committed and later removed (but still in the archive).
🔍 Active Reconnaissance — Precise Intelligence Through Direct Contact
Active recon makes direct contact with target systems. It is detectable—your IP address will appear in their logs—but it yields far more precise and actionable intelligence than passive methods alone.
DNS Enumeration — Mapping the Full Domain Space:
- Zone Transfer Attempt:
dig axfr @ns1.target.com target.com. A DNS zone transfer dumps every record in the DNS zone—every hostname, every IP, every mail server. This is a massive misconfiguration that still exists in the wild. A successful zone transfer instantly gives you a complete internal network map. - DNS Record Enumeration:
dig target.com ANY— Query all record types (A, AAAA, MX, TXT, CNAME, NS). TXT records often contain SPF, DKIM, and DMARC configuration—revealing email infrastructure. They sometimes contain sensitive internal information left by administrators. - Reverse DNS Lookup:
dig -x 203.0.113.1— Look up the hostname for an IP. Can reveal internal server naming conventions (e.g., prod-db-01.internal.target.com).
Subdomain Enumeration — Finding Hidden Attack Surfaces:
Organizations have dozens to hundreds of subdomains, many of which are less maintained and less secured than the main site. Development, staging, and internal tools are common targets.
- Passive (Certificate Transparency):
subfinder -d target.comor search crt.sh — queries certificate transparency logs. No contact with target at all. - Active Brute Force:
gobuster dns -d target.com -w /usr/share/wordlists/SecLists/Discovery/DNS/subdomains-top1million-5000.txt— Tests thousands of common subdomain names against the target's DNS servers. - Amass:
amass enum -d target.com— Combines passive and active enumeration. One of the most comprehensive subdomain discovery tools available.
Web Directory Enumeration — Finding Hidden Pages:
gobuster dir -u https://target.com -w /usr/share/wordlists/dirb/common.txt -x php,html,txt— Brute-forces hidden web directories and files. Finds /admin, /backup, /api, /config, and other sensitive endpoints not linked from the main site.ffuf -u https://target.com/FUZZ -w wordlist.txt— Fast web fuzzer. More flexible than gobuster, useful for discovering API endpoints and hidden parameters.
Banner Grabbing — Identifying Services:
nc -v target.com 22— Connect to SSH and read the banner: "SSH-2.0-OpenSSH_7.4". Now you know the exact SSH version.curl -I https://target.com— Fetch HTTP headers only. The Server header may reveal "Apache/2.4.49" or "nginx/1.14.0"—directly searchable for CVEs.telnet target.com 25— SMTP banner reveals mail server software and version.
📧 Email OSINT — Building the Phishing Target List
Email addresses are often the primary vector for initial access. A well-targeted phishing email to the right person can bypass every technical control in place.
- hunter.io: Discovers email addresses associated with a domain and reveals the email format pattern (e.g., firstname.lastname@target.com, f.lastname@target.com). With the format identified, you can generate valid email addresses for any employee found on LinkedIn even if their specific email isn't listed.
- theHarvester: Command-line OSINT tool that aggregates emails, subdomains, and IP addresses from multiple public sources:
theHarvester -d target.com -b google,linkedin,bing,twitter - HaveIBeenPwned (HIBP): Reveals whether email addresses have appeared in known breach datasets. If
admin@target.comappeared in the LinkedIn 2021 breach, that user's password from that breach may have been reused elsewhere—credential stuffing opportunity. - GitHub OSINT: Search GitHub for the target domain:
"@target.com" OR "target.com/api" in:code. Developers frequently leak API keys, passwords, and internal endpoint URLs in public repositories.
🎭 Putting It Together: A Real Recon Workflow
Here is the complete recon workflow used in professional penetration tests, from zero information to full target intelligence:
- Scope Definition: Confirm exactly what you're authorized to test (domains, IP ranges, applications). Document it. Never deviate from it.
- Passive OSINT: WHOIS → Shodan → Censys certificate transparency → Google dorking → LinkedIn employee enumeration → GitHub secret scanning → Wayback Machine. No contact with target systems.
- Subdomain Enumeration: crt.sh + subfinder (passive) → gobuster/amass DNS brute force (active). Build a complete list of all accessible subdomains.
- Active Recon: DNS zone transfer attempt → Nmap scan of discovered IP ranges → Web directory enumeration → Banner grabbing on interesting services.
- Email Harvesting: hunter.io → theHarvester → LinkedIn correlation → HIBP breach check.
- Documentation: Compile everything into an attack surface map: IP addresses, hostnames, open ports, software versions, employee names, email addresses, known breach exposure. This becomes the foundation for every subsequent module.
Defense against passive OSINT is genuinely difficult because the information is publicly available. Best practices include: use domain privacy on WHOIS registrations, configure SPF/DKIM/DMARC to limit email enumeration, disable DNS zone transfers (restrict axfr to authorized secondary DNS servers only), monitor GitHub for accidental secret exposure with tools like TruffleHog and GitHub secret scanning, conduct periodic attack surface assessments from the attacker's perspective, and remove sensitive information from job postings (don't list every internal technology you use).
✅ Module 05 Summary
- Professional attackers spend the majority of their time on reconnaissance. Invest in intelligence before touching anything.
- Passive OSINT (WHOIS, Shodan, Google Dorking, Censys, LinkedIn) leaves no trace on target systems. Always start here.
- Active recon (DNS zone transfers, subdomain brute-forcing, directory enumeration) is detectable but yields precise intelligence.
- Email harvesting builds the target list for phishing campaigns. hunter.io + LinkedIn + HIBP is a powerful combination.
- Always work within your authorized scope. Unauthorized reconnaissance is a criminal offense in most jurisdictions.
Knowledge Check
Ready to test your understanding of 05. Reconnaissance & OSINT?