Technical SEO

The Technical SEO Guide

The complete technical SEO framework — crawlability, indexation, Core Web Vitals, schema markup, JavaScript SEO, and the prioritization model we use on every engagement.

Ahsan SoomroMay 20, 202622 min readTechnical SEO

Technical SEO is the work that happens beneath the content layer. While on-page SEO focuses on what a page says and link building determines how many sites vouch for it, technical SEO determines whether search engines can find the page, render it correctly, and add it to their index at all.

Most ranking failures trace back to technical problems nobody noticed. A page blocked by robots.txt, an infinite crawl loop from faceted navigation, duplicate content diluting authority, or JavaScript content that never renders — none of these surface in standard analytics dashboards, but all of them suppress organic traffic.

This guide covers the complete technical SEO framework: the audit methodology, the fixes that actually move rankings, and the 20% of technical work that drives 80% of organic outcomes.

Key takeaways

Technical SEO is the prerequisite layer — content and links compound poorly on a broken foundation
Most technical problems are invisible in standard analytics — you have to actively audit for them
Crawlability, indexation, and Core Web Vitals are the three highest-leverage areas
Schema markup has become a technical SEO discipline in 2026 — it is foundational for AI engine visibility, not optional
Prioritize by traffic-weighted impact, not by severity — a minor issue on a high-traffic template beats a major issue on an orphan page

What technical SEO actually does

Technical SEO answers four questions for every page on a site:

Can search engines find it? (crawlability)
Will they add it to the index? (indexation)
Can they understand what it is? (structured data and semantics)
Does it perform well enough to deserve a ranking? (Core Web Vitals)

If the answer to any of these is no, the page won't rank regardless of content quality or backlink count. Technical SEO makes every other SEO investment possible.

91%

of pages on the web get zero organic traffic — often because of crawlability or indexation issues, not content quality

Source: Ahrefs, 2024

Crawlability

Crawlability determines whether search engine bots can access and fetch your pages. Problems here are foundational — if Googlebot can't reach a page, nothing else matters.

Robots.txt

Your robots.txt file at yourdomain.com/robots.txt tells crawlers which pages they may and may not fetch. The most damaging configuration errors are straightforward but catastrophic:

Disallow: / — blocks all crawlers from the entire site. This is almost always a staging environment configuration accidentally deployed to production. It has caused significant and measurable traffic losses at companies large enough to know better. Monitor robots.txt as part of every deployment process.

Disallowing CSS or JavaScript files — prevents Google from rendering your pages correctly. If Googlebot can't access your stylesheet, it evaluates your pages without layout context. Render quality affects indexation decisions.

Blocking paginated URLs that contain valuable content — a common mistake when teams try to clean up "duplicate" content without understanding what they're blocking.

Test every robots.txt change with Google's Robots Testing Tool in Search Console before deploying.

Robots.txt controls crawling, not indexation

A page blocked by robots.txt can still appear in search results if other pages link to it. Google can index a URL it has never crawled based on links alone. To prevent indexation, use a noindex meta tag — which requires the page to be crawlable in the first place. These two directives serve different purposes and are not interchangeable.

XML sitemaps

Your sitemap tells crawlers which URLs to prioritize. A well-built sitemap includes only:

Pages returning 200 HTTP status — no redirects, no soft 404s
Canonical URLs — not the non-canonical variants pointing elsewhere
Indexable pages — not pages with noindex declared

Including noindex pages in your sitemap sends conflicting signals. The best sitemaps are dynamically generated to auto-update as content is published — Next.js's sitemap.ts, properly-configured WordPress plugins, and CMS-generated feeds all handle this correctly.

Submit your sitemap in Google Search Console and monitor the Submitted/Indexed count. Large gaps between submitted and indexed URLs indicate either quality issues (Google choosing not to index) or crawl coverage gaps.

Crawl budget

Crawl budget — the practical limit on how many pages Googlebot crawls per site per day — matters primarily for sites with more than 10,000 URLs. The main levers are:

Ensure high-value pages are reachable through shallow paths (3 clicks or fewer from the homepage)
Remove low-value pages from the crawl path: parameter-based URLs, filtered views, empty tag pages
Fix redirect chains — each hop adds latency and loses a small amount of crawl credit

Log file analysis is the only definitive view of actual crawl behavior. The Crawl Stats report in Google Search Console provides the high-level view: total Googlebot requests per day, average response time, and breakdown by HTTP status code.

Indexation

Crawling and indexation are separate steps. A page can be crawled but not indexed for reasons under your control or under Google's control.

Under your control:

<meta name="robots" content="noindex"> explicitly excluded it
Canonical pointing to a different URL tells Google this isn't the primary version
The page redirects and Google indexes the destination, not the source

Under Google's control:

Perceived low content quality — the most common reason for "Crawled — currently not indexed" in GSC
Near-duplicate content — too similar to other pages on the site to index separately
Soft 404 — the page returns 200 status but displays sparse, error-like, or empty content

The Google Search Console Index Coverage report is your source of truth. Check it weekly for new exclusions. "Crawled — currently not indexed" at scale signals a content quality problem. "Discovered — currently not indexed" at scale signals a crawl budget problem.

When removing pages improves rankings

Removing low-quality pages from the index often improves the rankings of remaining pages. Google evaluates site quality partly at the domain level — a high percentage of thin, underperforming pages drags down the authority signals of stronger pages. De-indexing old press releases, empty tag pages, and low-traffic thin content routinely produces a rankings lift for the remaining pages within 30-60 days.

Core Web Vitals

Core Web Vitals are three real-user performance metrics Google uses as a confirmed ranking factor. They measure loading speed, interactivity, and visual stability:

Metric	What it measures	Good threshold
LCP — Largest Contentful Paint	How long the main content takes to load	Under 2.5 seconds
INP — Interaction to Next Paint	How fast the page responds to user input	Under 200 milliseconds
CLS — Cumulative Layout Shift	How much the layout moves unexpectedly	Under 0.1

Google measures Core Web Vitals from real Chrome user data aggregated in the Chrome User Experience Report (CrUX) — not from synthetic Lighthouse scores. A page can score 95 on Lighthouse and still fail Core Web Vitals in real-user data if the traffic base skews toward slower connections or older devices.

24%

higher conversion rates on ecommerce sites that pass all three Core Web Vitals vs sites that fail one or more

Source: Google, 2023

Where to focus

Template-level fixes have the highest ROI. A single CSS or image loading change applied to a product page template affects every product page simultaneously:

Largest Contentful Paint (LCP): Most commonly caused by unoptimized hero images (large JPEGs, no width/height attributes, not preloaded) or slow-loading web fonts. Use next/image or equivalent with priority prop for above-the-fold images. Preload critical fonts.

Interaction to Next Paint (INP): Long JavaScript tasks blocking the main thread. Look for heavy third-party scripts executing on page load — chat widgets, analytics heavy-hitters, and tag managers with many tags are frequent culprits.

Cumulative Layout Shift (CLS): Images and iframes without explicit dimensions. Ad slots that expand after load. Font swaps that change line heights. All of these shift layout for users after the initial paint. Reserve space for every element that loads asynchronously.

Schema markup

Schema markup — structured data using the Schema.org vocabulary in JSON-LD format — has become a first-class technical SEO discipline in 2026. Google uses it for rich results. AI engines use it for citation. Both have gotten dramatically better at using it.

The minimum viable schema stack

Every page (sitewide):

Organization — who you are, how to contact you, your logo, sameAs links to LinkedIn/Crunchbase/Wikipedia
WebSite — site name, URL, optional SearchAction for sitelinks search box
BreadcrumbList — the hierarchical navigation path to this page

By content type:

Blog posts and articles → Article with author, datePublished, dateModified, image
Service pages → Service with offers, areaServed, provider
FAQ sections → FAQPage with Question and Answer pairs
Team member pages → Person with sameAs LinkedIn, jobTitle, image
Glossary terms → DefinedTerm within a DefinedTermSet
Product pages → Product with Offer, AggregateRating, Review

Validate every implementation with validator.schema.org and Google's Rich Results Test. Invalid schema creates noise that Google treats as an unreliability signal.

Schema must match visible page content

Schema that doesn't match the visible page content — FAQ schema with no visible FAQ, Review schema with no reviews, HowTo schema on a paragraph-heavy article — triggers manual actions in Google Search Console and algorithmic devaluation. Every schema type you implement must correspond to visible content on the page. No exceptions.

Schema for AI engine citation

For AI Overviews, Perplexity, and other LLM-powered systems, schema serves an additional function: it helps models extract and cite specific facts about your brand accurately. The Organization schema's sameAs array, the Person schema's credentials, and the DefinedTerm schema's definitions are all heavily weighted in AI system knowledge graphs.

If Gemini or ChatGPT gets your founding year wrong, your services wrong, or your name wrong — that's almost always an entity signal problem solvable with proper Organization + WebSite schema combined with consistent NAP (name, address, phone) across web properties.

Site architecture

Site architecture is the structure created by your URL hierarchy and internal linking decisions. Good architecture makes it clear to search engines which pages are most important and how topics relate to each other.

URL structure

Effective URLs are descriptive, readable, and consistent:

Hyphens, not underscores: /technical-seo not /technical_seo — Google treats underscores as character joins
Short and descriptive: /glossary/anchor-text ages better than /glossary/what-is-anchor-text-complete-guide-2026
Consistent trailing slash: pick one and 301-redirect the other to prevent duplicate content
Avoid dates in evergreen slugs: /guides/link-building ages gracefully; /2026/guides/link-building doesn't

Internal linking

Internal linking is free link equity distribution. Every internal link from a high-traffic page to another page passes authority and accelerates Googlebot discovery. The highest-leverage internal linking actions:

Link from high-traffic, high-authority pages to newly published ones — accelerates indexation and authority transfer
Use descriptive anchor text that includes the target keyword where natural
Maintain pillar-and-cluster architecture: pillar links to every cluster post, every cluster post links back
Audit for orphan pages (no internal links pointing in) — these are rarely crawled and rarely rank regardless of content quality

JavaScript SEO

Modern JavaScript-heavy sites introduce a crawling complication: Googlebot has to execute JavaScript before it can read dynamically-rendered content, and there's a variable delay between initial crawl and JavaScript rendering. During that window, your page is effectively invisible.

The gold standard is server-side rendering (SSR) or static site generation (SSG). With these approaches, Googlebot receives complete HTML on the first request — no JavaScript execution needed to read the content. Next.js, Nuxt, and SvelteKit all support SSR/SSG natively.

To diagnose JavaScript issues: use URL Inspection in Google Search Console to compare "Page HTML when crawled" against the rendered view. If the crawled HTML is sparse — missing navigation, main content, or structured data — you have a JavaScript rendering problem.

Single-page applications (SPAs) built with React, Vue, or Angular without SSR are the highest-risk configuration. If migrating to SSR is too large a project immediately, Google's Dynamic Rendering (serving pre-rendered HTML to Googlebot while serving the SPA to users) is an interim option — but it's maintenance overhead and should be treated as a temporary solution.

Technical SEO prioritization

Not all technical SEO issues deserve equal urgency. Prioritize by: traffic impact × implementation effort. High-traffic template issues beat low-traffic page issues every time.

The sequence we use on every engagement:

Indexation blockers — noindex accidentally applied to revenue pages, canonical misconfigurations, robots.txt blocking key templates. Highest priority because they directly prevent ranking.
Core Web Vitals on high-traffic templates — homepage, product template, blog template. Template-level fixes apply to every page using that template simultaneously.
Schema implementation — Organization + BreadcrumbList sitewide as baseline. Article schema on content, Service schema on service pages. Foundation for AI engine visibility.
Internal linking gaps — orphan pages, pillar pages without cluster links, conversion pages with low internal link count.
Index cleanup — de-index thin content, duplicate pages, parameter-based URL variants.

Everything beyond this list is optimization on top of a working foundation. Don't reverse the order.

Common technical issues by site type

Ecommerce:

Faceted navigation creating thousands of near-duplicate indexed URLs (canonical and robots the filtered views)
Soft 404s on deleted product pages returning 200 status (return 404 or 410, or redirect to category)
Missing Product + Offer + AggregateRating schema on product detail pages
Images without explicit width/height causing CLS on category pages

SaaS:

Application subdomain (app.domain.com) accidentally indexed (block in robots.txt or noindex)
Thin integration pages with boilerplate copy and no unique value
Authenticated content returning 200 for logged-out users (should return 404 or redirect to login)

Content/publishing:

Tag and category pages duplicating indexed content (canonical or noindex the thin taxonomy pages)
Paginated article pages creating near-duplicate first-page content
Image-heavy posts failing LCP because hero images load lazily by default

Last updated May 2026. Found an error or have a question? Email us.