Add an automated auditor by DhrRob · Pull Request #8 · jdevalk/specification.website

DhrRob · 2026-05-30T14:16:35Z

What this changes

Adds a new primary call to action to the home page, a main navigation item and a footer item linking to /audit/. From here, users can simply enter their URL and the website will check 56 of the 108 checklist items automatically. The other items will still require manual auditing, as they cannot be tested reliably without AI.

It consists off a serverless Cloudflare Pages function (/api/check)

Known issues

Dark mode is not yet supported.
Buttons do not have any cursor pointers (and probably some more a11y issues)

To do

Manually verify each automated test whether they test correctly
Check for edge cases
Fix accessibility and UX issues

Verification Plan

Live demo running at https://specification-website-2bg.pages.dev/ or alternatively:

Run npm run dev to launch Astro.
Navigate to http://localhost:4321/audit/.
Input any domain (e.g. google.com) and submit.
Verify that "Local Simulation Mode" warning displays and mock audit metrics render successfully after 1.5 seconds.
Check if clicking manual check boxes updates the score, survives page reload, and isolates scope to that domain.

…tle reward)

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b7450d99d7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-30T14:19:35Z

+          diagArea.innerHTML = `<span class="font-semibold uppercase tracking-wider text-[10px] ${
+            outcome === 'pass' ? 'text-emerald-700' : outcome === 'fail' ? 'text-red-700' : 'text-amber-700'
+          }">Diagnostic Output:</span> <span class="text-ink-700">${matchingResult.message}</span>`;


Escape diagnostic messages before inserting them

When a user audits a hostile site, the backend reflects site-controlled values such as <title>, meta descriptions, canonical URLs, and response headers into matchingResult.message, and this line inserts that string with innerHTML. A scanned domain can therefore return a title/header containing markup like an onerror handler and execute script on specification.website for anyone who runs the audit; render the message as text or sanitize it before composing the diagnostic markup.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-30T14:19:35Z

+  // Clean the domain input
+  domain = domain.replace(/^(https?:\/\/)?(www\.)?/, '');
+  // Remove any trailing path/slashes
+  domain = domain.split('/')[0];


Preserve explicit www hosts during scans

For inputs like https://www.example.com, this normalization silently scans example.com instead of the host the user entered. Many production sites serve different content or security headers on the apex versus www, and some only work on www, so the auditor can report failures or passes for the wrong site; only strip the scheme/path and keep the hostname intact.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-30T14:19:35Z

+const checkMapping: Record<string, string> = {
+  // Foundations
+  'canonical-url': 'canonical-urls',
+  'meta-charset': 'character-encoding',
+  'meta-viewport': 'viewport',
+  'meta-description': 'meta-descriptions',
+  'title': 'title-tags',
+  'favicons': 'favicons',
+  'open-graph': 'open-graph',


Map the automated doctype and html-lang checks

The API already returns results for doctype and html-lang, and the content collection has matching spec pages, but these slugs are omitted from checkMapping, so those two automated checks are rendered as manual-only and their backend results are ignored on every audit. Add mappings for these slugs so the UI reflects the automated results instead of asking users to verify them manually.

Useful? React with 👍 / 👎.

jdevalk · 2026-05-31T17:33:25Z

This looks amazing. Let me test it a bit more - my initial thought is "why doesn't this automatically check things like my html doctype?

jdevalk

Thanks for the work on this — the breadth of checks is great. A few issues to flag before this can land.

The biggest one: the verification plan in the PR description never exercises the real API. astro dev doesn't run Cloudflare Pages Functions, so /api/check 404s, the catch block fires, and generateMockResults() returns hardcoded strings that ignore the domain entirely. Auditing joost.blog (or anything else) at localhost:4321 reports a passing doctype because the string 'Modern HTML5 doctype declaration (<!doctype html>) matches.' is baked into the mock at line 679, not because the page was ever fetched. The amber 'Local Simulation Mode' banner is the only signal that nothing real happened. Two consequences: (a) reviewers following the verification plan can't catch bugs in check.ts; (b) once the API does run, several checks have substantive correctness issues — see inline comments. I confirmed the live preview at https://specification-website-2bg.pages.dev/api/check?domain=joost.blog does return real results, so the wiring works; the dev story just needs wrangler pages dev instead of astro dev (or the mock removed so failures are visible).

Other PR-level items:

The PR branches from DhrRob/main rather than a feature branch — any future commits to your main will land in this PR.
functions/api/check.ts is one 733-line function. Splitting by category (security headers / HTML foundations / well-known fetches / DNS) would make this far easier to review and test.
No auth, no per-IP rate limit, and ~14 outbound fetches per submission. A script POSTing in a loop costs real money on the Pages Functions budget. Worth adding at least IP-based rate limiting via Workers KV before this is linked from the home page.
The audit's data-check-key is matched to API result slugs at audit.astro:477. Mismatches silently render 'Check skipped or not reported by backend API.' Worth a one-pass cross-check of every emitted slug against src/content/spec/.

jdevalk · 2026-05-31T17:42:13Z

+        mockNotice?.classList.remove('hidden');
+        // Wait 1.5s to simulate scan progress
+        await new Promise(r => setTimeout(r, 1500));
+        apiResults = generateMockResults(currentDomain);


This fallback is why the audit appears to work locally but isn't actually testing the site. When astro dev serves this page, /api/check 404s, the catch fires, and generateMockResults() returns ~50 hardcoded results that ignore currentDomain apart from string interpolation. Reviewers following the PR's verification plan see a pass on doctype, lang, OG, etc. for any input, including domains that don't exist.

Two fixes worth considering:

Drop the mock entirely and surface the error — a failed /api/check should look like a failure, not a success.

If you want a local-dev story, document wrangler pages dev instead of astro dev in the PR description and CLAUDE.md.

The baked strings in generateMockResults (line 666+) should be deleted regardless — they will rot independently of the real check.ts and mislead reviewers.

jdevalk · 2026-05-31T17:42:13Z

+  }
+
+  // Clean the domain input
+  domain = domain.replace(/^(https?:\/\/)?(www\.)?/, '');


Stripping www. unconditionally is wrong for sites that only respond on the www host. The HTTPS+HSTS path uses redirect: 'follow' so it survives, but the HTTP→HTTPS redirect check at line 82 uses redirect: 'manual' and only tries http://${domain} — so if http://example.com doesn't redirect but http://www.example.com does, this falsely reports HTTPS as failing.

Suggest: keep the user's input host, or try both apex and www and pass if either redirects.

jdevalk · 2026-05-31T17:42:13Z

+    });
+
+    // skip-links
+    const hasSkipLink = /href\s*=\s*["']#[^"']*["']/i.test(mainPageHtml) && /skip/i.test(mainPageHtml);


False positives. This passes if the page contains any href="#..." and the substring skip appears anywhere — a blog post titled 'Skip the upsell' with any in-page anchor will pass. Should require something like an <a href="#main" class="…skip…"> or check the anchor text/aria-label.

jdevalk · 2026-05-31T17:42:13Z

+
+    // PERFORMANCE
+    // compression
+    const contentEncoding = mainHeaders.get('content-encoding') || mainHeaders.get('x-encoded-content-encoding');


This check is effectively dead under Workers fetch(). The Workers runtime transparently decodes Content-Encoding and strips the header from the response. So mainHeaders.get('content-encoding') returns null for almost every site, including the heavily-Brotli'd joost.blog — which is why the live API reports warning there. Either drop this check, or use a separate HEAD with cf: { cacheTtl: 0 } plus an Accept-Encoding hint, or read response.cf.encoding if present.

jdevalk · 2026-05-31T17:42:13Z

+    results.push({
+      slug: 'http3',
+      result: supportsH3 ? 'pass' : 'warning',
+      message: supportsH3 ? `HTTP/3 supported (Alt-Svc: ${altSvc.substring(0, 50)}...).` : 'Alt-Svc header for HTTP/3 not returned in standard GET headers.',


Same problem as the compression check: Alt-Svc is unreliable from Workers — many CF-fronted sites won't expose it here. All Cloudflare-fronted sites support H3, so this will false-warning on a huge fraction of real sites. Consider reading response.cf.httpProtocol instead (it's 'HTTP/3' when the origin fetch used H3) or just drop the check from the automated pass and mark it manual.

jdevalk · 2026-05-31T17:42:13Z

+    });
+
+    // speculation-rules
+    const hasSpecRules = /<script\s+[^>]*type=["']speculationrules["']/i.test(mainPageHtml);


The Speculation Rules spec also supports an HTTP Speculation-Rules header pointing at an external JSON file — currently this only checks the inline <script type="speculationrules"> form. joost.blog happens to use the script form (which is why it passed) but the corresponding spec page documents both.

jdevalk · 2026-05-31T17:42:13Z

+    // PRIVACY
+    // privacy-policy
+    const privacyRegex = /href\s*=\s*["'][^"']*privacy[^"']*["']/i;
+    const hasPrivacyLink = privacyRegex.test(mainPageHtml) || /privacy/i.test(mainPageHtml.replace(/<[^>]+>/g, ''));


Strong false-positive risk. /privacy/i.test(mainPageHtml.replace(/<[^>]+>/g, '')) matches the literal word 'privacy' anywhere in visible text — 'for privacy reasons', 'we respect your privacy', etc. A site with no privacy policy will routinely pass this. Restrict to the <a href> test only (line 453), or require the anchor text/title to contain 'privacy'.

My knowledge here is not 100%, but would a rel="privacy-policy" suffice, or perhaps this is not as standard as I think it is? There's a fine line between keeping things simple and working on this one.

jdevalk · 2026-05-31T17:42:13Z

+        if (src.startsWith('//')) return true;
+        if (src.startsWith('http://') || src.startsWith('https://')) {
+          const parsed = new URL(src);
+          return !parsed.hostname.endsWith(domain);


Unsound same-origin check. parsed.hostname.endsWith(domain) does a substring match — for domain = 'joost.blog', evil-joost.blog would be treated as same-origin. Use a .${domain} boundary or proper public-suffix-aware comparison.

const h = parsed.hostname; return h !== domain && !h.endsWith(`.${domain}`);

jdevalk · 2026-05-31T17:42:13Z

+      { slug: 'change-password', url: `${rootUrl}/.well-known/change-password`, manualRedirect: true },
+      { slug: 'llms-txt', url: `${rootUrl}/llms.txt` },
+      { slug: 'apple-app-site-association', url: `${rootUrl}/.well-known/apple-app-site-association` },
+      { slug: 'assetlinks-json', url: `${rootUrl}//.well-known/assetlinks.json` },


Double slash: ${rootUrl}//.well-known/assetlinks.json → https://example.com//.well-known/.... Most servers tolerate it but some normalisation layers and CDNs will 404. Should be ${rootUrl}/.well-known/assetlinks.json.

jdevalk · 2026-05-31T17:42:13Z

+
+const USER_AGENT = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36 SpecAuditor/1.0';
+
+export const onRequest: PagesFunction = async (context) => {


Top-of-function concerns (no inline-friendly location, putting them here):

Abuse vector. This endpoint accepts arbitrary user-supplied hostnames, fans out 14+ outbound requests per submission, and is unauthenticated and unmetered. A script POSTing in a loop will cost real money on the Pages Functions budget. Add IP-based rate limiting (Workers KV with a 60s window is enough) before linking this from the home page.

Function length. 733 lines in one handler is hard to review and impossible to unit-test piecewise. Splitting by concern (security-headers.ts, html-foundations.ts, well-known.ts, dns.ts) and composing them in the handler would also make it easier to add new checks as the spec catalog grows.

Slug alignment. audit.astro matches API slugs to data-check-key attributes silently — a mismatch renders 'Check skipped or not reported by backend API.' Worth grepping the slugs emitted here against src/content/spec/<cat>/<slug>.md so the audit and the catalog stay in lockstep.

DhrRob · 2026-06-01T18:38:03Z

Cheers for the great feedback @jdevalk, I'll make some changes in the coming days! I'm not sure if I will be able to fix them all, considering my lack of experience with Astro, but I will try!

DhrRob added 3 commits May 30, 2026 15:46

feat: implement automated audit functionality for 56 out of 108 items.

7fea629

Merge branch 'jdevalk:main' into main

cf422cd

fix: add a pointer to any button, remove print report (many bugs, lit…

b7450d9

…tle reward)

chatgpt-codex-connector Bot reviewed May 30, 2026

View reviewed changes

jdevalk reviewed May 31, 2026

View reviewed changes


		const USER_AGENT = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36 SpecAuditor/1.0';

		export const onRequest: PagesFunction = async (context) => {

Conversation

DhrRob commented May 30, 2026

What this changes

Known issues

To do

Verification Plan

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 30, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot May 30, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot May 30, 2026

Choose a reason for hiding this comment

Uh oh!

jdevalk commented May 31, 2026

Uh oh!

jdevalk left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DhrRob commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants