{"description":"Structured real-site SEO/AEO studies for agents to learn reusable good patterns and avoidable failure modes.","objective":"Give a future agent enough structure to start from the main URL and understand how to inspect, plan, build, validate, and improve a serious website without needing prior chat context.","studies":[{"id":"nectiv-digital-seo-aeo-2026-06-04","siteName":"Nectiv Digital","url":"https://nectivdigital.com/","auditDate":"2026-06-04","evidenceRoot":"data/derived/site-studies/nectiv-digital-2026-06-04/","evidenceFiles":["data/derived/site-studies/nectiv-digital-2026-06-04/analysis-notes.md","data/derived/site-studies/nectiv-digital-2026-06-04/crawl_overview.csv"],"context":"Public agency site used as an SEO/AEO reference study for entity architecture, AI-search positioning, crawl discipline, and model-readable gaps.","strongPatterns":[{"pattern":"Narrow topical/entity focus","why":"The site consistently reinforces SEO, AEO/GEO, AI Search, ChatGPT, Perplexity, Google AI Mode, AI citations, and B2B brands instead of drifting into generic agency positioning.","reuseAs":"Build a small set of commercial entity pages, then support them with methods, proof, technology pages, and question-driven articles.","checkToAdd":"Confirm service, technology, proof, and article pages repeat a coherent entity set without unsupported keyword sprawl."},{"pattern":"Commercial pages map to distinct buying jobs","why":"Specific service pages make it easier for search engines, answer engines, and buyers to understand what the agency does.","reuseAs":"Create service pages around real buying intent rather than vague capability labels.","checkToAdd":"Confirm each primary service page has a unique purpose, title, H1, schema type, and internal-link role."},{"pattern":"Technology pages support differentiation","why":"Technology pages make the agency's approach inspectable instead of leaving differentiation as a broad claim.","reuseAs":"Add method, system, or technology pages when a site sells expertise that needs explanation.","checkToAdd":"Require differentiation pages to connect visible claims to proof, process, schema, and internal links."},{"pattern":"Entity schema is stable across page types","why":"Organization, WebSite, founder, Service, BlogPosting, and BreadcrumbList schema help connect visible content to machine-readable entity facts.","reuseAs":"Keep entity IDs and schema roles consistent across home, service, and article pages.","checkToAdd":"Validate that schema entities match visible page content and reuse stable IDs."}],"avoidPatterns":[{"issue":"www hostname TLS failure","why":"A certificate mismatch on https://www.nectivdigital.com/ weakens trust and can break a common direct-entry hostname before redirect logic is reached.","detectedBy":"Direct TLS/hostname check plus Screaming Frog domain verification.","checkToAdd":"Check apex, www, HTTP, HTTPS, redirect target, and certificate coverage before launch."},{"issue":"Missing llms.txt and llms-full.txt","why":"For a site selling AI-search/AEO services, missing model-readable entry points are a visible mismatch between positioning and technical surface.","detectedBy":"Direct requests to /llms.txt and /llms-full.txt returned 404.","checkToAdd":"Require llms.txt and llms-full.txt when a site claims AI-search, AEO, GEO, or agent-readiness expertise."},{"issue":"Indexable pages missing H1","why":"Indexable pages without an H1 reduce page clarity for users, crawlers, and answer systems.","detectedBy":"Screaming Frog h1_missing.csv flagged /ai-instructions and /contact.","checkToAdd":"Fail or warn on every indexable HTML page with no H1."},{"issue":"Indexable article missing meta description","why":"A missing description leaves search snippets and model summaries less controlled for a page designed to demonstrate AEO performance.","detectedBy":"Screaming Frog meta_description_missing.csv flagged the AEO experiment article.","checkToAdd":"Warn on indexable pages without descriptions, with an explicit exception path only for intentional no-description pages."},{"issue":"Crawler-visible internal media 403s","why":"Blocked embedded assets can degrade real page experience even when core HTML is indexable.","detectedBy":"Screaming Frog response_codes_client_error_(4xx).csv flagged internal media URLs.","checkToAdd":"Verify crawler-visible 403s directly before classifying them as broken or intentionally restricted."}],"agentTakeaways":["Good examples should be copied as principles, not as brand, copy, or proprietary assets.","Bad examples should become checks, backlog items, or explicit non-actions.","AEO claims should be backed by model-readable endpoints, schema, entity clarity, and crawl evidence.","Crawler findings need direct validation when a response code may reflect crawler blocking instead of real user failure."],"linkedRoadmapItems":["good-bad-example-library","seo-aeo-evaluation-layer","crawl-evidence-artifacts","real-site-comparisons","aeo-geo-model-context"]},{"id":"specification-website-agent-reference-2026-06-09","siteName":"Specification Website","url":"https://specification.website/","auditDate":"2026-06-09","evidenceRoot":"data/derived/site-studies/specification-website-2026-06-09/","evidenceFiles":["data/derived/site-studies/specification-website-2026-06-09/analysis-notes.md","data/derived/site-studies/specification-website-2026-06-09/crawl_overview.csv"],"context":"Agent-facing specification/documentation site used to teach machine-readable discovery surfaces, well-known/API links, reference-site information architecture, security headers, sitemap migration, external-reference maintenance, and method-aware endpoint classification.","strongPatterns":[{"pattern":"Agent and API discovery surfaces are first-class","why":"The site exposes llms.txt, llms-full.txt, .well-known discovery, API catalog links, MCP/A2A hints, agent skills discovery, sitemap index, RSS, and security policy through headers and routes.","reuseAs":"Model agent-facing websites with visible and protocol-level discovery surfaces so agents can recover purpose, files, APIs, and operating context from the main URL.","checkToAdd":"Require agent-facing sites to expose LLM files, manifest-style discovery, sitemap, security policy, and any API/agent cards through stable routes or response links."},{"pattern":"Specification IA uses stable category and detail routes","why":"The crawl found a broad set of indexable HTML routes organized around specification categories and detail pages, making traversal predictable for agents and crawlers.","reuseAs":"Use stable category/detail URL conventions for documentation-heavy reference sites.","checkToAdd":"Validate that documentation category pages, detail pages, checklists, and examples have explicit route purpose, title/H1 intent, and internal-link roles."},{"pattern":"Security and policy headers are part of the reference surface","why":"Direct response checks showed HSTS, CSP, frame denial, COOP/CORP, no-vary-search, permissions policy, referrer policy, and x-content-type-options.","reuseAs":"Treat production web hygiene as part of the website operating contract, not a post-launch infrastructure add-on.","checkToAdd":"Add production launch checks for security headers and document which headers are required, optional, or context-dependent."},{"pattern":"Sitemap index migration keeps legacy discovery working","why":"/sitemap.xml redirects to /sitemap-index.xml while headers advertise the sitemap index, preserving crawler discovery through a common legacy path.","reuseAs":"Use clean redirects for discovery-surface migrations when the canonical replacement is explicit and consistently advertised.","checkToAdd":"Allow sitemap redirects only when the final target returns XML, is advertised, and all sitemap URLs resolve."}],"avoidPatterns":[{"issue":"External standards/reference links can rot or block crawlers","why":"The 4xx export included stale or crawler-blocked external references across standards and documentation sites, which weakens reference trust if left unclassified.","detectedBy":"Screaming Frog response_codes_client_error_(4xx).csv plus direct classification notes in analysis-notes.md.","checkToAdd":"Run recurring external-link classification for reference-heavy sites and separate true 404s from crawler-blocked 403s or method-specific API responses."},{"issue":"Overview/detail page pairs can duplicate titles and H1s","why":"Duplicate title/H1 exports flagged category/detail pairs for well-known URIs, privacy policy, and agent readiness. These may be intentional, but agents need canonical rationale or differentiated titles.","detectedBy":"Screaming Frog page_titles_duplicate.csv and h1_duplicate.csv.","checkToAdd":"Require documentation category and detail pages to either differentiate title/H1 intent or document canonical/internal-link rationale."},{"issue":"Agent/API endpoints need method-aware classification","why":"An A2A endpoint returned 405 to a crawler GET request, which may be correct API behavior but should not be treated as an ordinary broken HTML page.","detectedBy":"Screaming Frog client-error export and analysis notes.","checkToAdd":"Classify API, MCP, A2A, and agent-card endpoints separately from HTML route crawl health."}],"agentTakeaways":["Agent-facing sites should expose discovery through HTML, headers, LLM files, well-known files, API catalogs, and agent cards.","Reference sites need external-link maintenance because authoritative standards links move or block crawlers over time.","Duplicate title/H1 pairs may be acceptable for overview/detail pairs only when page intent and canonical rationale are explicit.","Crawler-visible API errors need method-aware checks before they become broken-link tasks."],"linkedRoadmapItems":["good-bad-example-library","seo-aeo-evaluation-layer","real-site-comparisons","aeo-geo-model-context","crawl-evidence-artifacts"]},{"id":"ryan-pierce-personal-consultant-2026-05-21","siteName":"RyanPierce.ai","url":"https://www.ryanpierce.ai/","auditDate":"2026-05-21","evidenceRoot":"data/derived/site-studies/ryanpierce-ai-2026-05-21/","evidenceFiles":["data/derived/site-studies/ryanpierce-ai-2026-05-21/ryanpierce-ai-analysis.md","data/derived/site-studies/ryanpierce-ai-2026-05-21/crawl_overview.csv"],"context":"Small personal/consulting portfolio site used to teach canonical-host consistency, clean small-site metadata, external-link verification, and security-header hardening.","strongPatterns":[{"pattern":"Small site with clean on-page metadata","why":"The crawl showed no missing titles, descriptions, H1s, H2s, or image alt text on the crawled 200 HTML pages, proving a small expert site can keep basic SEO hygiene simple and complete.","reuseAs":"Use small personal sites as a controlled baseline for metadata, H1, canonical, and image-alt expectations before adding complex content systems.","checkToAdd":"Confirm every public HTML page has a title, meta description, H1, H2 where useful, canonical tag, and image alt policy."},{"pattern":"All important pages are shallow from the homepage","why":"A compact crawl depth helps agents, users, and crawlers recover the site structure quickly.","reuseAs":"Keep personal consultant portfolios shallow, with work examples and conversion paths reachable directly from the homepage.","checkToAdd":"Warn when core proof/work pages require more than two clicks from the homepage."}],"avoidPatterns":[{"issue":"Canonical host mismatch","why":"The live final host used www while canonical tags pointed to bare-domain URLs that redirected back to www, causing pages to be treated as canonicalized/non-indexable by the crawl.","detectedBy":"Screaming Frog canonical/indexability data plus direct bare-domain redirect verification.","checkToAdd":"Require canonicals, Open Graph URLs, sitemap URLs, redirects, and final host to agree before launch."},{"issue":"External profile links require browser validation","why":"Medium and LinkedIn returned crawler-block or bot-block style responses, so agents need to separate real broken links from crawler-visible blocks before changing public profile links.","detectedBy":"Screaming Frog external response codes plus direct HTTP checks.","checkToAdd":"Classify external 403/999/social responses as manual-verification warnings, not automatic broken-link failures."},{"issue":"Baseline security headers are incomplete","why":"A site can be SEO-clean but still lack CSP, X-Content-Type-Options, X-Frame-Options, and Referrer-Policy hardening.","detectedBy":"Direct response/header checks captured in the analysis notes.","checkToAdd":"Add launch QA for baseline security headers on production responses."}],"agentTakeaways":["Small sites are useful QA baselines because every page can be inspected deeply.","Canonical host mismatch can invalidate otherwise clean metadata.","External social/profile links need manual browser validation when crawler responses are ambiguous.","Security headers belong in launch QA even when they are not the primary SEO issue."],"linkedRoadmapItems":["good-bad-example-library","seo-aeo-evaluation-layer","crawl-evidence-artifacts","real-site-comparisons"]},{"id":"the-prep-casa-saas-2026-05-21","siteName":"The Prep Casa","url":"https://www.the-prep.casa/","auditDate":"2026-05-21","evidenceRoot":"data/derived/site-studies/the-prep-casa-2026-05-21/","evidenceFiles":["data/derived/site-studies/the-prep-casa-2026-05-21/the-prep-casa-analysis.md","data/derived/site-studies/the-prep-casa-2026-05-21/crawl_overview.csv"],"context":"SaaS/product site study used to teach head metadata rendering, canonical discipline, signup-page indexability decisions, image-alt policy, and direct-link hygiene.","strongPatterns":[{"pattern":"Healthy crawl response baseline","why":"The crawl found no internal 4xx, 5xx, no-response URLs, mixed content, or redirect chains, which makes deeper metadata and rendering issues easier to isolate.","reuseAs":"Use clean response-code health as the starting line for SaaS launch QA, not the finish line.","checkToAdd":"Separate response-code pass state from metadata, schema, accessibility, and rendered-head validation."},{"pattern":"Product information architecture has feature and resource routes","why":"Feature pages, resources, blog posts, and changelog-style content give agents a clear product-site pattern for mapping use cases to page types.","reuseAs":"Create SaaS playbooks that pair homepage, feature, resource, signup, and contact-sales routes with distinct SEO/AEO duties.","checkToAdd":"Require every SaaS feature route to have a unique title, description, H1, canonical, schema role, and internal-link purpose."}],"avoidPatterns":[{"issue":"Metadata and canonical tags appear outside the expected head surface","why":"Non-JS crawlers, social crawlers, and SEO tools may miss metadata if title, description, canonical, and robots signals are not present in the server-rendered head.","detectedBy":"Screaming Frog outside-head warnings plus browser-rendered spot checks.","checkToAdd":"Verify rendered and raw head metadata for title, description, canonical, robots, Open Graph, and schema on representative routes."},{"issue":"Indexable signup page lacks heading structure","why":"A signup route that remains indexable needs clear H1/H2 structure; otherwise it should have an explicit noindex decision.","detectedBy":"Screaming Frog h1_missing/h2_missing exports and browser-rendered spot checks.","checkToAdd":"Require every indexable conversion route to either expose a meaningful H1 or document a noindex policy."},{"issue":"Meaningful logo images lack alt policy","why":"Technology/trust logos can be decorative or meaningful, but agents must make that choice explicitly instead of leaving assistive technology behavior accidental.","detectedBy":"Screaming Frog images_missing_alt_text.csv.","checkToAdd":"Require image-alt decisions to classify logos as decorative alt-empty or meaningful with concise alt text."}],"agentTakeaways":["Clean crawl response codes do not prove SEO readiness.","SaaS sites need raw/rendered head validation because framework metadata behavior can drift.","Signup and conversion routes need explicit indexability decisions.","Feature-route IA should become a site-type playbook, not just a visual sitemap."],"linkedRoadmapItems":["good-bad-example-library","seo-aeo-evaluation-layer","site-type-build-playbooks","page-pattern-library","real-site-comparisons"]},{"id":"jordan-browne-moore-content-consultant-2026-06-04","siteName":"Jordan Browne-Moore","url":"https://www.jordanbrowne-moore.com/","auditDate":"2026-06-04","evidenceRoot":"data/derived/site-studies/jordan-browne-moore-2026-06-03/","evidenceFiles":["data/derived/site-studies/jordan-browne-moore-2026-06-03/jordan-browne-moore-website-audit.md","data/derived/site-studies/jordan-browne-moore-2026-06-03/crawl_overview.csv"],"context":"Content-led expert consultant site study used to teach the difference between rendered quality and crawler-visible raw HTML, plus domain health, sitemap/robots validity, soft 404s, and article-route SEO.","strongPatterns":[{"pattern":"Specific expert positioning and article depth","why":"The rendered site has clear fintech ML, LLM, credit-risk, fraud, and KYC positioning with article content that demonstrates real expertise rather than generic thought leadership.","reuseAs":"Use content-led consultant sites to show how proof, technical depth, and topic specificity can create authority signals.","checkToAdd":"Require expert/article sites to map articles to a coherent entity/topic set with proof and conversion paths."},{"pattern":"Rendered desktop and mobile layouts are coherent","why":"The browser evidence showed no horizontal overflow across tested routes and credible visual continuity across long article pages.","reuseAs":"Use rendered screenshots and Lighthouse/browser evidence to evaluate design operability after technical crawl checks.","checkToAdd":"Pair crawler raw-HTML checks with browser-rendered layout and accessibility checks for JavaScript-heavy sites."}],"avoidPatterns":[{"issue":"Apex domain fails while www works","why":"A strong site can lose direct users and natural backlinks when the bare domain returns an infrastructure error.","detectedBy":"Direct HTTP/domain checks recorded in the audit.","checkToAdd":"Require apex, www, HTTP, HTTPS, TLS, and canonical redirect checks before launch."},{"issue":"robots.txt and sitemap.xml return app HTML instead of valid files","why":"Crawler discovery surfaces must be real protocol files, not client-app fallback shells.","detectedBy":"Direct requests and Lighthouse/Screaming Frog findings.","checkToAdd":"Fail QA when robots.txt is not robots syntax or sitemap.xml is not XML."},{"issue":"Client-rendered SPA hides article content from raw HTML","why":"A content-led expert site depends on crawlable article text, headings, and internal links; browser-rendered quality is not enough if raw HTML is nearly empty.","detectedBy":"Screaming Frog raw crawl plus rendered Playwright evidence comparison.","checkToAdd":"Compare raw HTML and rendered DOM for core article routes and flag missing raw H1/content/internal links."},{"issue":"Unknown routes return 200 soft 404s","why":"Soft 404s waste crawl budget and can create low-value indexable URLs.","detectedBy":"Direct invalid-route status check in the audit.","checkToAdd":"Require unknown routes to return a real 404 status and useful 404 page."}],"agentTakeaways":["Rendered UX quality and crawler-visible SEO quality must be tested separately.","Content-led expert sites need SSR/SSG/prerendered article content or equivalent crawlable HTML.","Robots and sitemap files are protocol surfaces, not decorative endpoints.","Soft 404 behavior should be part of launch QA for every framework app."],"linkedRoadmapItems":["good-bad-example-library","seo-aeo-evaluation-layer","browser-visual-qa","crawl-evidence-artifacts","site-type-build-playbooks","real-site-comparisons"]},{"id":"poor-ai-generated-service-site-fixture-2026-06-04","siteName":"Poor AI-Generated Service Site Fixture","url":"fixture://poor-ai-generated-service-site","auditDate":"2026-06-04","evidenceRoot":"data/derived/site-studies/poor-ai-generated-service-site/","evidenceFiles":["data/derived/site-studies/poor-ai-generated-service-site/analysis-notes.md"],"context":"Intentionally weak local-service fixture used to teach agents what to reject: generic AI copy, unsupported claims, thin entity context, mismatched schema/content, and missing machine-readable surfaces.","strongPatterns":[{"pattern":"Visible offer and contact path exist","why":"Even weak AI-generated sites often include the basic commercial shape: a service offer, some FAQs, and a contact path. Agents should preserve useful intent while rebuilding the evidence layer.","reuseAs":"Treat visible offer/contact intent as a starting point, then require source-backed facts, specific service details, and validated metadata/schema.","checkToAdd":"Confirm every conversion path is tied to a real service, service area, source fact, and contact route."}],"avoidPatterns":[{"issue":"Generic copy with no source-backed entity model","why":"A page can sound plausible while failing to identify who the business is, what proof exists, where it operates, and which claims are approved.","detectedBy":"Fixture analysis notes and absence of source ledger/crawl/LLM evidence.","checkToAdd":"Require source-traceability claims before public copy, schema, FAQ answers, or LLM summaries can be marked ready."},{"issue":"FAQ, schema, and visible copy drift","why":"AI-generated FAQ answers often introduce service promises or proof claims that are not visible elsewhere or supported by sources.","detectedBy":"Fixture analysis notes.","checkToAdd":"Compare FAQ/schema facts against visible page sections and source-traceability records."},{"issue":"No agent-readable or crawl evidence surfaces","why":"Without llms files, sitemap, robots, crawl reports, and handoff evidence, future agents must infer readiness from appearance.","detectedBy":"Fixture analysis notes.","checkToAdd":"Block readiness claims until llms.txt, llms-full.txt, sitemap, robots, crawl report, and handoff report exist."}],"agentTakeaways":["Bad examples should be safe fixtures when copying a real weak site would be noisy or unfair.","Generic AI copy is not a content system; it needs source, claim, schema, and QA contracts.","Agents should salvage useful intent while rejecting unsupported claims and vague positioning.","A site can only be called ready when evidence surfaces exist, not when it looks complete."],"linkedRoadmapItems":["good-bad-example-library","source-to-site-traceability","seo-aeo-evaluation-layer","aeo-geo-model-context","handoff-reports"]}]}