Proteus Entropy Benchmark Suite

A deterministic entropy engine for benchmarking scraper resilience against progressively harder web pages — from raw PRNG bits to fully mutated DOM.

The Problem

The web scraping arms race has no scoreboard.

Every day, scrapers and anti-bot systems fight an invisible battle across the web. Sites deploy honeypot traps, Shadow DOM encapsulation, timing-based content gating, browser fingerprinting, and dozens of other techniques to detect and block automated access. Scrapers adapt. Sites evolve. Neither side has a reproducible way to measure how hard a page is to scrape — or how resilient a scraper is against a known set of defenses.

Proteus changes that. It provides a deterministic, seed-controlled environment where every anti-scraping technique is cataloged, every page mutation is reproducible, and every scraper can be benchmarked against the exact same challenge — today, tomorrow, or a year from now.

The Core Insight

Deterministic randomness: reproducible chaos.

Proteus is built on a simple but powerful idea: if the randomness is deterministic, the chaos is reproducible. Given the same seed, the same entropy level, and the same mutation profiles, Proteus will generate the exact same web page — with the exact same honeypot placement, the exact same obfuscation patterns, the exact same timing traps.

This means you can vary a single parameter — say, raise entropy from 0.3 to 0.8 — and watch the page transform from a simple HTML table into a labyrinth of Shadow DOM nesting, zero-width Unicode obfuscation, and intersection-observer-gated content. Your scraper either handles it or it doesn't, and you know exactly which technique broke it.

How It Works

Three layers, one pipeline: generate → validate → apply.

1

PRNG Engine

The source of deterministic randomness

Three algorithms — Mulberry32 (fast, JS-compatible), Xoshiro256** (high-quality, 256-bit state), and PCG32 (permuted congruential) — each produce a deterministic stream of pseudorandom bytes from a seed. Same seed, same stream, every time.

Mulberry32 Xoshiro256** PCG32

2

Statistical Validation (Gym)

Proving the randomness is sound

Before randomness drives mutations, it must be validated. The Gym runs a full battery of statistical tests — Shannon entropy, chi-squared goodness-of-fit, serial correlation, Monte Carlo π estimation, compression ratio, and the complete NIST SP 800-22 suite (Monobit, Block Frequency, Runs, Longest Run, FFT Spectral, Cumulative Sums, Approximate Entropy, Serial). Each test produces a p-value; the composite gym score (0–100) tells you exactly how "random" a PRNG is.

Shannon Entropy Chi-Squared Monte Carlo π NIST SP 800-22 FFT Spectral Gym Score 0–100

3

DOM Application (Lab)

Turning validated entropy into real web pages

The validated PRNG output feeds into 34 scenario templates across four difficulty tiers. Five mutation profiles — structure, layout, copy, timing, behavior — control which DOM axes are randomized. The result: real HTML pages that a scraper must navigate, each deterministically generated from seed + entropy. Serve them via the API, or preview them in the interactive demo.

7 Basic 9 Scraper 10 Anti-Bot 8 Expert 5 Mutation Profiles

The 34 Scenarios

Each scenario targets a specific scraper challenge. Difficulty scales with entropy — the same page at 0.1 entropy might be trivial; at 0.9, it could defeat production scrapers.

Basic

Card Grid — variable count & ordering
Data Table — rows/columns mutations
Form — field mutation & label changes
Infinite Feed — lazy-loading & virtual lists
Modals & Tooltips — portals & z-stacking
Layout Stress — deeply nested containers
Structured Data — JSON-LD, Open Graph, microdata

Scraper

Pagination — numbered, load-more, infinite
Shadow DOM — open, closed, nested
AJAX/XHR — fetch, dynamic imports, websockets
Iframe Nesting — srcdoc, data-uri, nested
Text Obfuscation — unicode, zero-width, entities
CSS Exfiltration — data in CSS, not DOM text
WebSocket Stream — data via WS, not HTTP
SPA Shell — empty HTML, JS-rendered content
CSV Export — file download with decoy HTML preview

Anti-Bot

Honeypot — invisible fields, CSS hiding, aria traps
Timing Traps — setTimeout, rAF, IntersectionObserver
Fingerprint — canvas, WebGL, audio
Headless Detection — webdriver, plugins, runtime
Session & Cookies — CSRF, visit-gating, storage
Font Cipher — shuffled @font-face glyph mapping
Decoy Injection — plausible fake data mixed in
DOM Sentinel — MutationObserver + property traps
Request Fingerprint — header & TLS validation
Rate Limit Guard — timing enforcement, content scrambling

Expert

Mouse Behavior — velocity, curvature, click timing
CAPTCHA — math, slider, image-select, rotation
Canvas Rendering — data painted on Canvas2D
Proof of Work — Cloudflare-style JS challenge
Polymorphic Markup — different DOM every seed
Multi-Page Journey — 3-page token state machine
Multi-Step Workflow — 5-step login→checkout RPA flow
SVG Data Rendering — data in SVG elements, no HTML text

Choose Your Path

Gym

For: Cryptographers, PRNG authors, QA engineers

Benchmark the statistical quality of pseudorandom number generators. Configure algorithm, seed, sample size, and significance level. Get a composite score (0–100), full NIST SP 800-22 certification results, and interactive visualizations. Compare algorithms side-by-side.

Lab

For: Scraper developers, anti-bot engineers, QA teams

Resolve all 34 scenarios at your chosen entropy level. Browse the catalog, filter by category, and click into interactive demos that show exactly how each technique mutates the DOM. See what your scraper is up against.

API

For: Automated pipelines, CI/CD, scraper test suites

The Proteus Lab Server at proteus.terrabench.io serves every scenario as a real HTML page. Point your scraper at /lab/honeypot?seed=42&entropy=0.7 and get a fully rendered page with deterministic mutations. JSON endpoints for inspection, resolution, and scripting.

curl https://proteus.terrabench.io/lab/honeypot?seed=42&entropy=0.7

Technical Foundation

Rust Core

Pure Rust library — zero unsafe, no allocator dependencies. Five modules: prng, entropy, profiles, scenarios, scoring.

WebAssembly

Compiled to WASM via wasm-bindgen. Runs in-browser at near-native speed. ~114 KB payload.

Deterministic

Same seed + same parameters = same output. Always. Every PRNG, every scenario, every mutation. 54 tests verify it.

Axum Server

Standalone HTTP server serves all 34 scenarios as scrapable HTML. JSON API for programmatic access. CORS-enabled.

Configure Benchmark

Select PRNG algorithm, seed, and entropy parameters to run the deterministic benchmark engine.

PRNG Algorithm

Xoshiro256** High-quality 64-bit PRNG. Excellent statistical properties.

PCG32 Permuted congruential generator. Fast with good uniformity.

Mulberry32 Simple 32-bit PRNG. TypeScript-compatible baseline.

Parameters

Seed Sample Size (bytes) Entropy Level 0.50 NIST Significance (α)

Benchmark Results

Run a benchmark to see results.

No results yet. Configure and run a benchmark above.

— /100

—

Shannon Entropy — ideal: 8.0 bits/byte

Chi-Squared p-value — ideal: 0.1 – 0.9

Serial Correlation — ideal: ~0.0

Monte Carlo π — ideal: 3.14159...

Runs Z-Score — ideal: |z| < 1.96

Compression Ratio — ideal: ~1.0

Mean Byte Value — ideal: 127.5

Quality Score — composite 0–100

Algorithm Comparison

Metric	Xoshiro256**	PCG32	Mulberry32

NIST SP 800-22

Statistical Test Suite for Random and Pseudorandom Number Generators

Run a benchmark to see NIST SP 800-22 results.

0/0 tests passed

α = 0.01 0 bits analyzed

Test	Statistic	p-value	Verdict

Visualizations

Visual analysis of PRNG output distribution and correlation structure.

Run a benchmark to see visualizations.

Scenario Lab

Test your scrapers against 34 entropy-driven scenarios. Each page serves real, mutated HTML — via the dashboard or the API.

Endpoints

Method	Endpoint	Description
`GET`	`/api/health`	Server health check
`GET`	`/api/scenarios`	List all 34 scenario definitions
`GET`	`/api/resolve?seed=&entropy=&profiles=`	Resolve full suite → JSON
`GET`	`/api/scenario/:route?seed=&entropy=`	Single scenario → JSON
`GET`	`/lab/:route?seed=&entropy=&profiles=`	Serve scenario as scrapable HTML

Query Parameters

Param	Default	Description
`seed`	42	PRNG seed — same seed = deterministic output
`entropy`	0.5	Entropy level 0.0–1.0 — higher = more mutations
`profiles`	structure,layout,copy,timing,behavior	Comma-separated mutation profiles

Examples

## List all scenarios
curl https://proteus.terrabench.io/api/scenarios | jq

# Resolve full suite
curl "https://proteus.terrabench.io/api/resolve?seed=123&entropy=0.8" | jq

# Scrape an actual HTML page (what your scraper sees)
curl https://proteus.terrabench.io/lab/honeypot?seed=42&entropy=0.7

# Same page, different entropy = different mutations
curl https://proteus.terrabench.io/lab/honeypot?seed=42&entropy=0.2

Seed Entropy Level 0.50

Click "Resolve All Scenarios" to generate the scenario catalog, then click "Demo" to preview — or hit the API to scrape them programmatically.

Interactive Demo

Select a scenario from the Lab to see it in action.

Resolve scenarios in the Lab, then click "Demo" on any scenario card.