11. Anti-Detection & Stealth
Operating Without Detection
Websites actively protect themselves from automated access using a multi-layered detection stack: IP reputation checks, browser fingerprinting, behavioral analysis, CAPTCHA systems, and WAF (Web Application Firewall) rules. Understanding how each detection layer works is the prerequisite to circumventing it legitimately. This knowledge is also essential for defensive security โ understanding attacker techniques helps defenders build better detection.
Important: The techniques in this module are for scraping publicly available data and authorized automation testing. Using them to bypass authentication, access private data, or violate platform terms of service is illegal and unethical.
๐ How Websites Detect Bots
Detection is layered. Each layer provides a signal; enough signals combined result in a block. Understanding each layer allows targeted countermeasures:
- IP Reputation: Your IP's history of abuse. Datacenter IPs (AWS, GCP, DigitalOcean ranges) are immediately suspicious. Residential IPs from ISPs have much better reputation. Services like MaxMind flag IPs that have appeared in abuse databases.
- Browser Fingerprinting: The combination of your browser's navigator properties forms a unique fingerprint. Headless Chrome has distinctive values: navigator.webdriver=true, navigator.plugins=[] (no plugins), navigator.languages=['en-US'] only, certain screen dimensions, missing battery API. Sites like bot.sannysoft.com reveal your fingerprint.
- Behavioral Analysis: Mouse movements follow Bezier curves; bots move in straight lines. Scroll events have momentum; bot scrolls are instantaneous. Time between form field fills is consistent; humans vary. Page dwell time is too short.
- Request Patterns: Same exact timing between requests. Sequential page numbers. No Referer header on deep page loads. No image/CSS/JS requests (only HTML). Perfectly regular user-agent rotation.
- JavaScript Challenges: Sites run JavaScript that checks for headless browser indicators, requires proof-of-work computation, or sets cookies via JS that pure HTTP clients can't receive.
๐ญ Browser Fingerprint Evasion
from playwright.sync_api import sync_playwright
def create_undetectable_context(playwright):
browser = playwright.chromium.launch(
headless=True,
args=[
'--no-sandbox',
'--disable-blink-features=AutomationControlled', # removes automation flag
'--disable-features=IsolateOrigins,site-per-process',
]
)
context = browser.new_context(
user_agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36',
viewport={'width': 1366, 'height': 768}, # common resolution
locale='en-US',
timezone_id='America/New_York',
# Simulate common plugins and permissions
permissions=['geolocation'],
geolocation={'latitude': 40.7128, 'longitude': -74.0060}, # New York
color_scheme='light',
)
# Patch navigator.webdriver to undefined
context.add_init_script('''
Object.defineProperty(navigator, 'webdriver', {get: () => undefined});
Object.defineProperty(navigator, 'plugins', {get: () => [1, 2, 3, 4, 5]});
Object.defineProperty(navigator, 'languages', {get: () => ['en-US', 'en']});
window.chrome = {runtime: {}};
''')
return browser, context
with sync_playwright() as p:
browser, context = create_undetectable_context(p)
page = context.new_page()
# Test your fingerprint
page.goto('https://bot.sannysoft.com')
page.wait_for_timeout(2000)
page.screenshot(path='fingerprint_test.png', full_page=True)
browser.close()๐ Proxy Management
import random
import requests
from typing import List, Optional
from dataclasses import dataclass
from datetime import datetime
@dataclass
class Proxy:
host: str
port: int
username: str = ''
password: str = ''
proxy_type: str = 'http' # http, https, socks5
failures: int = 0
last_used: Optional[datetime] = None
@property
def url(self) -> str:
if self.username:
return f'{self.proxy_type}://{self.username}:{self.password}@{self.host}:{self.port}'
return f'{self.proxy_type}://{self.host}:{self.port}'
@property
def dict(self) -> dict:
return {'http': self.url, 'https': self.url}
class ProxyPool:
def __init__(self, proxies: List[Proxy], max_failures: int = 3):
self.proxies = proxies
self.max_failures = max_failures
def get_proxy(self) -> Optional[Proxy]:
healthy = [p for p in self.proxies if p.failures < self.max_failures]
if not healthy:
raise RuntimeError('All proxies exhausted')
return random.choice(healthy)
def report_failure(self, proxy: Proxy):
proxy.failures += 1
if proxy.failures >= self.max_failures:
print(f'Proxy {proxy.host} marked as failed after {proxy.failures} errors')
def report_success(self, proxy: Proxy):
proxy.failures = max(0, proxy.failures - 1) # partial recovery on success
proxy.last_used = datetime.now()
# Using proxies with requests
pool = ProxyPool([Proxy('proxy1.example.com', 8080, 'user', 'pass')])
proxy = pool.get_proxy()
try:
resp = requests.get('https://target.com', proxies=proxy.dict, timeout=15)
pool.report_success(proxy)
except requests.RequestException:
pool.report_failure(proxy)
# Using proxies with Playwright
with sync_playwright() as p:
browser = p.chromium.launch(
proxy={'server': proxy.url}
)๐ Human Behavior Simulation
import asyncio
import random
import math
from playwright.async_api import Page
async def human_type(page: Page, selector: str, text: str):
'''Type like a human โ variable speed, occasional mistakes'''
await page.click(selector)
for char in text:
await page.keyboard.type(char)
# Variable delay between keystrokes (80-250ms)
delay = random.gauss(140, 40)
await asyncio.sleep(max(80, delay) / 1000)
async def human_scroll(page: Page, target_y: int = None):
'''Scroll with human-like momentum'''
current_y = await page.evaluate('window.scrollY')
target = target_y or await page.evaluate('document.body.scrollHeight')
steps = random.randint(5, 10)
for i in range(steps):
progress = (i + 1) / steps
# Ease-in-out curve
eased = 0.5 - math.cos(progress * math.pi) / 2
scroll_to = current_y + (target - current_y) * eased
await page.evaluate(f'window.scrollTo(0, {scroll_to})')
await asyncio.sleep(random.uniform(0.05, 0.2))
async def random_delay(min_ms: int = 1000, max_ms: int = 4000):
'''Random delay with Gaussian distribution โ more natural than uniform'''
mean = (min_ms + max_ms) / 2
std = (max_ms - min_ms) / 6
delay = max(min_ms, min(max_ms, random.gauss(mean, std)))
await asyncio.sleep(delay / 1000)Target Server Defense
Knowledge Check
Ready to test your understanding of 11. Anti-Detection & Stealth?