SEO Tools

Robots.txt Guide: How to Write and Use It

April 2026 · 6 min read · ToolsBox Team

A complete guide to robots.txt — what it does, how to write rules, common mistakes and best practices.

Robots.txt Generator

How to Write a robots.txt File

📅 April 2026⏱ 7 min read✍️ ToolsBox

The robots.txt file is a small but powerful text file at the root of your website that tells search engine crawlers which pages to visit and which to skip. A correctly written robots.txt helps search engines use their crawl budget on your most important pages. An incorrectly written one can accidentally block your entire site from Google — a mistake that is surprisingly common and can devastate rankings.

What Is robots.txt?

robots.txt is a plain text file placed at https://yourdomain.com/robots.txt. When a search engine crawler visits your site, it checks this file first. The file uses the Robots Exclusion Protocol — a standard that most major crawlers respect.

Important limitations to understand:

  • robots.txt is a request, not a firewall. Malicious bots ignore it. It is only for compliant crawlers like Googlebot and Bingbot.
  • Disallowing a URL prevents crawling but not indexing. Google can still index a URL it has never crawled if other pages link to it.
  • robots.txt rules apply to crawling only. For indexing control, use the noindex meta tag.
  • The file must be in the root directory. A robots.txt in a subfolder is ignored by crawlers.

The Basic Syntax

A robots.txt file contains groups of directives called "records." Each record starts with a User-agent line and is followed by Allow or Disallow lines:

User-agent: *
Allow: /
Disallow: /admin/

Sitemap: https://yourdomain.com/sitemap.xml
  • User-agent: * — applies this rule to all crawlers. Replace * with a specific bot name like Googlebot to target only that crawler.
  • Allow: / — explicitly allows crawling of everything (this is actually the default, so it can be omitted).
  • Disallow: /admin/ — prevents crawling of the /admin/ directory and all URLs within it.
  • Sitemap: — tells crawlers where your XML sitemap is. Not technically part of the Robots Exclusion Protocol but supported by all major crawlers.

Common robots.txt Patterns

Allow everything (simplest possible file):

User-agent: *
Allow: /

Sitemap: https://yourdomain.com/sitemap.xml

Block specific directories:

User-agent: *
Disallow: /wp-admin/
Disallow: /cart/
Disallow: /checkout/
Disallow: /account/

Sitemap: https://yourdomain.com/sitemap.xml

Block all crawlers from everything (staging/dev sites):

User-agent: *
Disallow: /

Use this on development or staging servers to prevent them from being indexed. Never use it on your live production site.

Block specific crawlers (e.g., to save bandwidth from scrapers):

User-agent: AhrefsBot
Disallow: /

User-agent: SemrushBot
Disallow: /

User-agent: *
Allow: /

Critical Mistakes to Avoid

  • Disallow: / on a live site — This blocks all crawling of your entire site. Googlebot will stop crawling and eventually deindex all your pages.
  • Blocking CSS and JavaScript — Google needs to render your pages to evaluate them. Blocking assets in /assets/, /css/, or /js/ can prevent proper rendering and hurt rankings.
  • Conflicting rules — If a URL matches both an Allow and Disallow rule, the more specific rule wins. If they are equal length, Allow wins (in Google's implementation).
  • Forgetting to remove development disallows — Many WordPress sites have Disallow: / added during development and it is never removed when the site goes live.
  • Using robots.txt for sensitive pages — If a page contains sensitive information, disallowing it is not enough — it could still be discovered. Use authentication or server-side access control instead.

robots.txt and Crawl Budget

Crawl budget is the number of pages Googlebot crawls on your site within a given timeframe. For large sites, Google allocates a fixed amount of crawling time based on your site's authority, server speed, and content freshness. For small and new sites, crawl budget is not a concern for most pages — but it explains why noindexing thin, low-value pages is important.

Every page in your sitemap that Googlebot crawls but finds unworthy of indexing (thin content, duplicate, low quality) uses crawl budget without benefit. By removing those pages from your sitemap AND disallowing them in robots.txt (or noindexing them), you direct Google's limited crawling time toward your valuable pages.

The correct tool for each job:

  • robots.txt Disallow — prevents crawling. Use for admin areas, infinite scroll URLs, internal search result pages, and staging directories.
  • noindex meta tag — prevents indexing but allows crawling. Use for pages you want Google to be able to crawl (for link equity purposes) but not show in search results.
  • Both together — complete exclusion. Google cannot crawl or index the page. Use for truly low-value pages you want entirely removed from Google's consideration.

robots.txt for WordPress Sites

WordPress generates specific URL patterns that are typically worth blocking in robots.txt:

User-agent: *
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /xmlrpc.php
Disallow: /?s=          # Search result pages
Disallow: /cart/
Disallow: /checkout/
Disallow: /my-account/
Allow: /wp-admin/admin-ajax.php

Sitemap: https://yourdomain.com/sitemap_index.xml

The Allow: /wp-admin/admin-ajax.php line is important — blocking this can break dynamic features on your front-end that use AJAX. Most WordPress SEO plugins (Yoast, RankMath) generate a robots.txt with these rules automatically.

How to Test Your robots.txt File

Before deploying a robots.txt change, always test it:

  1. Google Search Console robots.txt tester — Go to Settings → robots.txt and test any URL against your current rules to confirm allowed/blocked status.
  2. Direct URL check — Visit yourdomain.com/robots.txt in a browser to confirm the file is accessible and contains the correct content.
  3. Google URL Inspection tool — After making changes, use URL Inspection on a specific page to confirm Googlebot can access it.
  4. Third-party robots.txt validators — Tools like Google's own robots.txt tester parse the file and highlight any syntax errors.

A robots.txt change takes effect the next time Googlebot reads the file — typically within a day or two. Major disallow additions can take a few days to show up as dropped pages in Google Search Console's Coverage report.

Generating a robots.txt File

Our Robots.txt Generator lets you configure your crawling rules using a simple form — choose which directories to allow or disallow, enter your sitemap URL, and the tool generates the correct robots.txt content ready to copy. Pair it with our Sitemap Generator to create both key SEO files in minutes.

Generate your robots.txt — free

Build a correct robots.txt file using a simple form. No technical knowledge needed.
Open Robots.txt Generator →

Frequently Asked Questions

What does a robots.txt file do?

robots.txt tells search engine crawlers (Googlebot, Bingbot, etc.) which pages or sections of your site they should or should not crawl. It is placed at the root of your domain (/robots.txt). Crawlers read it before accessing any other page on your site.

Does robots.txt prevent pages from appearing in search results?

No — disallowing a URL in robots.txt prevents Googlebot from crawling it but does not prevent it from being indexed if other pages link to it. To prevent indexing, use a noindex meta tag on the page itself. Disallowing and noindexing together is needed for true removal.

What happens if I block Googlebot by mistake?

If you accidentally block Googlebot from your important pages (e.g., Disallow: /), Google cannot crawl your site. Your pages will eventually be deindexed. Use Google Search Console's robots.txt tester to verify your rules before deploying.

Do I need a robots.txt file if I want everything indexed?

No. If you want all pages crawled and indexed, you don't need a robots.txt file at all — crawlers will access everything by default. However, it is good practice to have one because it lets you specify the location of your sitemap, which helps search engines discover your pages.

Back to Blog  |  Related tool: Robots.txt Generator