Robots.txt
Also called Robots exclusion protocol, Robots exclusion standard.
A robots.txt file is a plain-text file at a domain's root that tells search engine crawlers which pages or directories they are and aren't allowed to crawl.
What it means
Robots.txt is the first thing a well-behaved search engine crawler checks when it visits a site. The file uses a simple syntax of Allow, Disallow, and User-agent rules to communicate crawl permissions. A single `Disallow: /` blocks all crawlers from the entire site. `Disallow: /admin/` blocks only the admin directory. Crawlers that don't respect robots.txt at all — including some AI scrapers — are not 'well-behaved' by this standard.
The most important misunderstanding about robots.txt is what it does and doesn't control. It governs crawling, not indexing. A page blocked by robots.txt can still appear in search results if other pages link to it — Google can index a URL it's never crawled based on link signals alone. To prevent indexation, you need a noindex meta tag on the page itself, which requires the page to be crawled and rendered first.
The most damaging robots.txt mistake is accidentally blocking the wrong things. Mis-formatting a Disallow rule, missing a trailing slash, or deploying a staging `Disallow: /` rule to production has caused measurable traffic losses even at large sites. Robots.txt should be monitored as part of routine technical SEO, especially around deployment events.
Key takeaways
- Robots.txt controls crawling, not indexation — a blocked page can still be indexed via links
- Incorrect configuration (especially Disallow: /) has caused significant traffic losses at real sites
- Google treats robots.txt as a strong signal but isn't strictly obligated to follow it
- Test all robots.txt changes with Google's Robots Testing Tool before deploying
- Robots exclusion protocol
- Robots exclusion standard
Last updated . Spotted something wrong? Let us know.
Keep digging.
Crawl Budget
Crawl budget is the number of pages Googlebot will crawl on your site in a given timeframe, determined by your site's crawl capacity and Google's perceived demand to crawl it.
Technical SEORead moreIndexation
Also: Indexing, IndexabilityIndexation is the process by which a search engine adds a page to its searchable index after crawling it — and indexability is the property of being eligible for that process.
Technical SEORead moreTechnical SEO
Also: Technical optimizationTechnical SEO is the practice of optimizing a website's infrastructure — crawlability, indexation, site speed, structured data, and URL architecture — so search engines can efficiently discover, parse, and rank it.
Technical SEORead moreXML Sitemap
Also: Sitemap.xml, Sitemap fileAn XML sitemap is a file that lists a website's important URLs in a structured format, giving search engines a reliable list of pages to discover and crawl — particularly useful for large sites or pages that lack strong internal links.
Technical SEORead more
Knowing what Robots.txt is, is the easy part.
Implementing it on your site is what moves the needle. Get a free SEO audit and we’ll show you where robots.txt fits in your roadmap.