Structured Web Audit Crawler — What It Actually Does

Structured Web Audit Crawler verifying a site’s files, schema, and trust links

The Structured Web Audit Crawler isn’t an “SEO scorecard.” It’s a zero-trust verifier that proves a site says what it claims—in code. It inspects canonical files, structured data, and page content to compute alignment, confirm trust-mesh participation, and surface anything that silently degrades integrity (autoloaded scripts, cookies, semantic drift). Output is a machine-readable report plus human-readable rollups.

What It Verifies (and Why It Matters)

Instead of nudging cosmetic fixes, the crawler enforces protocol: canonical JSON-LD must exist and match the visible HTML; verification routes must backlink the mesh; zero-trust rules must hold; and performance must be real (sub-second home if you care about first impression). In short—trust is measured, not marketed.

Structure: Required endpoints present and internally consistent (/, /verify.html, /verify.json, /robots.txt, /sitemap.xml, etc.).
Schema: Valid JSON-LD (and microdata if present) extracted and archived; invalid or missing structured data fails the page.
Trust loop: Backlink to structuredweb.org/verify enforced on the root and both verify routes; visible HTML link required on /verify.html.
Zero-trust: Flags autoloaded third-party JS, server-set cookies, and popups/overlays on non-home routes.
Semantic alignment: Compares keywords claimed in JSON-LD to what’s actually written in the HTML body; lists missing terms so you can fix drift.
Performance: Measures real fetch time and penalizes slow homepages; favors static, edge-cached delivery.

How Scoring Works

Each page starts at 100 and loses points for high-impact failures (no JSON-LD, missing mesh backlink on required paths, zero-trust violations, slow home). Alignment below threshold subtracts proportionally. A domain rollup aggregates averages, pass/fail counts, and a mesh-participation grade based on the verify-link triad.

Outputs You Can Act On

Per-page reports: status, load time, violations, alignment %, shared/missing terms.
Raw schema archive: the exact JSON-LD blocks the crawler saw.
Site report: average score and alignment, participation grade, and a compact summary for executives.

Why this is different: The crawler treats your site as a verifiable data product. If you claim it in JSON-LD, you must show it in HTML. If you join the mesh, you must link it where it counts. No trackers. No surprises. Just provable integrity.

Deep Mechanics

Five cooperating modules do the work:

Performance Audit: measures live fetch time and checks JS/cookie behavior.
Schema Audit: extracts JSON-LD (and microdata), validates structure, and stores a raw dump.
Trust Backlink Audit: enforces isPartOf/sameAs to the mesh on /, /verify.html, and /verify.json (with a visible anchor on the HTML variant).
Zero-Trust Audit: flags autoloaded third-party scripts, server cookies, and load-time popups on non-home routes.
Semantic Alignment: tokenizes JSON-LD descriptions/keywords vs. body text and computes overlap; missing terms are listed explicitly.

Operating Modes

Use it three ways:

Single URL: spot-check a page during development.
Sitemap crawl: audit the whole site from /sitemap.xml.
Mesh-wide: resolve /mesh.json entries and recurse across all listed domains for network-health reporting.

Implementation Signals

User-Agent: StructuredWebAuditBot/1.0 (+https://structuredweb.org/crawler)
Respectful crawl: honors robots.txt, rate-limited, caches results.
Zero-dependency preference: static HTML5 + JSON-LD; inline CSS or single stylesheet; no analytics/trackers by default.

Upgrade Path: From “Pass” to “Provable”

Passing is the floor. To raise your trust ceiling: keep verify mirrors in lockstep (HTML ↔ JSON), maintain sub-second home, ship WebP only, and treat JSON-LD as your canonical source of truth. The more deterministic your structure, the stronger your mesh gravity—and the easier it is for agents to cite you.

Common Failure Patterns

Structured data says one thing; the page says another (semantic drift).
Forgot the visible verify link on /verify.html.
“Harmless” analytics autoloading on every route.
Speed regressions after redesigns (images not compressed to WebP, added fonts/scripts).

Adoption Checklist

Root, /verify.html, and /verify.json exist and backlink the mesh.
Every non-.txt/.json page ships valid JSON-LD.
Homepage FCP < 1s; images are WebP; CSS is lean.
No autoloaded third-party JS or cookies on non-home routes.
JSON-LD keywords are present (somewhere) in your HTML copy.