A deterministic entropy engine for benchmarking scraper resilience against progressively harder web pages — from raw PRNG bits to fully mutated DOM.
The web scraping arms race has no scoreboard.
Every day, scrapers and anti-bot systems fight an invisible battle across the web. Sites deploy honeypot traps, Shadow DOM encapsulation, timing-based content gating, browser fingerprinting, and dozens of other techniques to detect and block automated access. Scrapers adapt. Sites evolve. Neither side has a reproducible way to measure how hard a page is to scrape — or how resilient a scraper is against a known set of defenses.
Proteus changes that. It provides a deterministic, seed-controlled environment where every anti-scraping technique is cataloged, every page mutation is reproducible, and every scraper can be benchmarked against the exact same challenge — today, tomorrow, or a year from now.
Deterministic randomness: reproducible chaos.
Proteus is built on a simple but powerful idea: if the randomness is deterministic, the chaos is reproducible. Given the same seed, the same entropy level, and the same mutation profiles, Proteus will generate the exact same web page — with the exact same honeypot placement, the exact same obfuscation patterns, the exact same timing traps.
This means you can vary a single parameter — say, raise entropy from 0.3 to 0.8 — and watch the page transform from a simple HTML table into a labyrinth of Shadow DOM nesting, zero-width Unicode obfuscation, and intersection-observer-gated content. Your scraper either handles it or it doesn't, and you know exactly which technique broke it.
Three layers, one pipeline: generate → validate → apply.
The source of deterministic randomness
Three algorithms — Mulberry32 (fast, JS-compatible), Xoshiro256** (high-quality, 256-bit state), and PCG32 (permuted congruential) — each produce a deterministic stream of pseudorandom bytes from a seed. Same seed, same stream, every time.
Proving the randomness is sound
Before randomness drives mutations, it must be validated. The Gym runs a full battery of statistical tests — Shannon entropy, chi-squared goodness-of-fit, serial correlation, Monte Carlo π estimation, compression ratio, and the complete NIST SP 800-22 suite (Monobit, Block Frequency, Runs, Longest Run, FFT Spectral, Cumulative Sums, Approximate Entropy, Serial). Each test produces a p-value; the composite gym score (0–100) tells you exactly how "random" a PRNG is.
Turning validated entropy into real web pages
The validated PRNG output feeds into 34 scenario templates across four difficulty tiers. Five mutation profiles — structure, layout, copy, timing, behavior — control which DOM axes are randomized. The result: real HTML pages that a scraper must navigate, each deterministically generated from seed + entropy. Serve them via the API, or preview them in the interactive demo.
Each scenario targets a specific scraper challenge. Difficulty scales with entropy — the same page at 0.1 entropy might be trivial; at 0.9, it could defeat production scrapers.
For: Cryptographers, PRNG authors, QA engineers
Benchmark the statistical quality of pseudorandom number generators. Configure algorithm, seed, sample size, and significance level. Get a composite score (0–100), full NIST SP 800-22 certification results, and interactive visualizations. Compare algorithms side-by-side.
For: Scraper developers, anti-bot engineers, QA teams
Resolve all 34 scenarios at your chosen entropy level. Browse the catalog, filter by category, and click into interactive demos that show exactly how each technique mutates the DOM. See what your scraper is up against.
For: Automated pipelines, CI/CD, scraper test suites
The Proteus Lab Server at proteus.terrabench.io serves every scenario as a real
HTML page. Point your scraper at /lab/honeypot?seed=42&entropy=0.7 and get
a fully rendered page with deterministic mutations. JSON endpoints for inspection,
resolution, and scripting.
curl https://proteus.terrabench.io/lab/honeypot?seed=42&entropy=0.7
Pure Rust library — zero unsafe, no allocator dependencies. Five modules: prng, entropy, profiles, scenarios, scoring.
Compiled to WASM via wasm-bindgen. Runs in-browser at near-native speed. ~114 KB payload.
Same seed + same parameters = same output. Always. Every PRNG, every scenario, every mutation. 54 tests verify it.
Standalone HTTP server serves all 34 scenarios as scrapable HTML. JSON API for programmatic access. CORS-enabled.
Select PRNG algorithm, seed, and entropy parameters to run the deterministic benchmark engine.
Run a benchmark to see results.
No results yet. Configure and run a benchmark above.
| Metric | Xoshiro256** | PCG32 | Mulberry32 |
|---|
Statistical Test Suite for Random and Pseudorandom Number Generators
Run a benchmark to see NIST SP 800-22 results.
| Test | Statistic | p-value | Verdict |
|---|
Visual analysis of PRNG output distribution and correlation structure.
Run a benchmark to see visualizations.
Frequency of each byte value (0–255). Uniform = random.
Consecutive byte pairs as (x, y) coordinates. Uniform fill = random.
Correlation at increasing lags. Values near 0 = independent.
Proportion of 1-bits vs 0-bits. Ideal: 50/50.
Test your scrapers against 34 entropy-driven scenarios. Each page serves real, mutated HTML — via the dashboard or the API.
Click "Resolve All Scenarios" to generate the scenario catalog, then click "Demo" to preview — or hit the API to scrape them programmatically.
Select a scenario from the Lab to see it in action.
Resolve scenarios in the Lab, then click "Demo" on any scenario card.