Loading WASM...

Proteus Entropy Benchmark Suite

A deterministic entropy engine for benchmarking scraper resilience against progressively harder web pages — from raw PRNG bits to fully mutated DOM.

The Problem

The web scraping arms race has no scoreboard.

Every day, scrapers and anti-bot systems fight an invisible battle across the web. Sites deploy honeypot traps, Shadow DOM encapsulation, timing-based content gating, browser fingerprinting, and dozens of other techniques to detect and block automated access. Scrapers adapt. Sites evolve. Neither side has a reproducible way to measure how hard a page is to scrape — or how resilient a scraper is against a known set of defenses.

Proteus changes that. It provides a deterministic, seed-controlled environment where every anti-scraping technique is cataloged, every page mutation is reproducible, and every scraper can be benchmarked against the exact same challenge — today, tomorrow, or a year from now.

The Core Insight

Deterministic randomness: reproducible chaos.

Proteus is built on a simple but powerful idea: if the randomness is deterministic, the chaos is reproducible. Given the same seed, the same entropy level, and the same mutation profiles, Proteus will generate the exact same web page — with the exact same honeypot placement, the exact same obfuscation patterns, the exact same timing traps.

This means you can vary a single parameter — say, raise entropy from 0.3 to 0.8 — and watch the page transform from a simple HTML table into a labyrinth of Shadow DOM nesting, zero-width Unicode obfuscation, and intersection-observer-gated content. Your scraper either handles it or it doesn't, and you know exactly which technique broke it.

How It Works

Three layers, one pipeline: generate → validate → apply.

1

PRNG Engine

The source of deterministic randomness

Three algorithms — Mulberry32 (fast, JS-compatible), Xoshiro256** (high-quality, 256-bit state), and PCG32 (permuted congruential) — each produce a deterministic stream of pseudorandom bytes from a seed. Same seed, same stream, every time.

Mulberry32 Xoshiro256** PCG32
2

Statistical Validation (Gym)

Proving the randomness is sound

Before randomness drives mutations, it must be validated. The Gym runs a full battery of statistical tests — Shannon entropy, chi-squared goodness-of-fit, serial correlation, Monte Carlo π estimation, compression ratio, and the complete NIST SP 800-22 suite (Monobit, Block Frequency, Runs, Longest Run, FFT Spectral, Cumulative Sums, Approximate Entropy, Serial). Each test produces a p-value; the composite gym score (0–100) tells you exactly how "random" a PRNG is.

Shannon Entropy Chi-Squared Monte Carlo π NIST SP 800-22 FFT Spectral Gym Score 0–100
3

DOM Application (Lab)

Turning validated entropy into real web pages

The validated PRNG output feeds into 34 scenario templates across four difficulty tiers. Five mutation profiles — structure, layout, copy, timing, behavior — control which DOM axes are randomized. The result: real HTML pages that a scraper must navigate, each deterministically generated from seed + entropy. Serve them via the API, or preview them in the interactive demo.

7 Basic 9 Scraper 10 Anti-Bot 8 Expert 5 Mutation Profiles

The 34 Scenarios

Each scenario targets a specific scraper challenge. Difficulty scales with entropy — the same page at 0.1 entropy might be trivial; at 0.9, it could defeat production scrapers.

Basic

  • Card Grid — variable count & ordering
  • Data Table — rows/columns mutations
  • Form — field mutation & label changes
  • Infinite Feed — lazy-loading & virtual lists
  • Modals & Tooltips — portals & z-stacking
  • Layout Stress — deeply nested containers
  • Structured Data — JSON-LD, Open Graph, microdata

Scraper

  • Pagination — numbered, load-more, infinite
  • Shadow DOM — open, closed, nested
  • AJAX/XHR — fetch, dynamic imports, websockets
  • Iframe Nesting — srcdoc, data-uri, nested
  • Text Obfuscation — unicode, zero-width, entities
  • CSS Exfiltration — data in CSS, not DOM text
  • WebSocket Stream — data via WS, not HTTP
  • SPA Shell — empty HTML, JS-rendered content
  • CSV Export — file download with decoy HTML preview

Anti-Bot

  • Honeypot — invisible fields, CSS hiding, aria traps
  • Timing Traps — setTimeout, rAF, IntersectionObserver
  • Fingerprint — canvas, WebGL, audio
  • Headless Detection — webdriver, plugins, runtime
  • Session & Cookies — CSRF, visit-gating, storage
  • Font Cipher — shuffled @font-face glyph mapping
  • Decoy Injection — plausible fake data mixed in
  • DOM Sentinel — MutationObserver + property traps
  • Request Fingerprint — header & TLS validation
  • Rate Limit Guard — timing enforcement, content scrambling

Expert

  • Mouse Behavior — velocity, curvature, click timing
  • CAPTCHA — math, slider, image-select, rotation
  • Canvas Rendering — data painted on Canvas2D
  • Proof of Work — Cloudflare-style JS challenge
  • Polymorphic Markup — different DOM every seed
  • Multi-Page Journey — 3-page token state machine
  • Multi-Step Workflow — 5-step login→checkout RPA flow
  • SVG Data Rendering — data in SVG elements, no HTML text

Choose Your Path

Gym

For: Cryptographers, PRNG authors, QA engineers

Benchmark the statistical quality of pseudorandom number generators. Configure algorithm, seed, sample size, and significance level. Get a composite score (0–100), full NIST SP 800-22 certification results, and interactive visualizations. Compare algorithms side-by-side.

Lab

For: Scraper developers, anti-bot engineers, QA teams

Resolve all 34 scenarios at your chosen entropy level. Browse the catalog, filter by category, and click into interactive demos that show exactly how each technique mutates the DOM. See what your scraper is up against.

API

For: Automated pipelines, CI/CD, scraper test suites

The Proteus Lab Server at proteus.terrabench.io serves every scenario as a real HTML page. Point your scraper at /lab/honeypot?seed=42&entropy=0.7 and get a fully rendered page with deterministic mutations. JSON endpoints for inspection, resolution, and scripting.

curl https://proteus.terrabench.io/lab/honeypot?seed=42&entropy=0.7

Technical Foundation

Rust Core

Pure Rust library — zero unsafe, no allocator dependencies. Five modules: prng, entropy, profiles, scenarios, scoring.

WebAssembly

Compiled to WASM via wasm-bindgen. Runs in-browser at near-native speed. ~114 KB payload.

Deterministic

Same seed + same parameters = same output. Always. Every PRNG, every scenario, every mutation. 54 tests verify it.

Axum Server

Standalone HTTP server serves all 34 scenarios as scrapable HTML. JSON API for programmatic access. CORS-enabled.

Configure Benchmark

Select PRNG algorithm, seed, and entropy parameters to run the deterministic benchmark engine.

PRNG Algorithm

Parameters

Benchmark Results

Run a benchmark to see results.

No results yet. Configure and run a benchmark above.

NIST SP 800-22

Statistical Test Suite for Random and Pseudorandom Number Generators

Run a benchmark to see NIST SP 800-22 results.

Visualizations

Visual analysis of PRNG output distribution and correlation structure.

Run a benchmark to see visualizations.

Scenario Lab

Test your scrapers against 34 entropy-driven scenarios. Each page serves real, mutated HTML — via the dashboard or the API.

API Proteus Lab Server serves these scenarios as scrapable HTML pages at proteus.terrabench.io
curl https://proteus.terrabench.io/lab/honeypot?seed=42&entropy=0.7

Click "Resolve All Scenarios" to generate the scenario catalog, then click "Demo" to preview — or hit the API to scrape them programmatically.

Interactive Demo

Select a scenario from the Lab to see it in action.

Resolve scenarios in the Lab, then click "Demo" on any scenario card.