What Is robots.txt?
robots.txt is a plain text file placed at https://yourdomain.com/robots.txt. When a search engine crawler visits your site, it checks this file first. The file uses the Robots Exclusion Protocol — a standard that most major crawlers respect.
Important limitations to understand:
- robots.txt is a request, not a firewall. Malicious bots ignore it. It is only for compliant crawlers like Googlebot and Bingbot.
- Disallowing a URL prevents crawling but not indexing. Google can still index a URL it has never crawled if other pages link to it.
- robots.txt rules apply to crawling only. For indexing control, use the
noindexmeta tag. - The file must be in the root directory. A robots.txt in a subfolder is ignored by crawlers.
The Basic Syntax
A robots.txt file contains groups of directives called "records." Each record starts with a User-agent line and is followed by Allow or Disallow lines:
User-agent: *
Allow: /
Disallow: /admin/
Sitemap: https://yourdomain.com/sitemap.xml
User-agent: *— applies this rule to all crawlers. Replace*with a specific bot name likeGooglebotto target only that crawler.Allow: /— explicitly allows crawling of everything (this is actually the default, so it can be omitted).Disallow: /admin/— prevents crawling of the /admin/ directory and all URLs within it.Sitemap:— tells crawlers where your XML sitemap is. Not technically part of the Robots Exclusion Protocol but supported by all major crawlers.
Common robots.txt Patterns
Allow everything (simplest possible file):
User-agent: *
Allow: /
Sitemap: https://yourdomain.com/sitemap.xml
Block specific directories:
User-agent: *
Disallow: /wp-admin/
Disallow: /cart/
Disallow: /checkout/
Disallow: /account/
Sitemap: https://yourdomain.com/sitemap.xml
Block all crawlers from everything (staging/dev sites):
User-agent: *
Disallow: /
Use this on development or staging servers to prevent them from being indexed. Never use it on your live production site.
Block specific crawlers (e.g., to save bandwidth from scrapers):
User-agent: AhrefsBot
Disallow: /
User-agent: SemrushBot
Disallow: /
User-agent: *
Allow: /
Critical Mistakes to Avoid
- Disallow: / on a live site — This blocks all crawling of your entire site. Googlebot will stop crawling and eventually deindex all your pages.
- Blocking CSS and JavaScript — Google needs to render your pages to evaluate them. Blocking assets in /assets/, /css/, or /js/ can prevent proper rendering and hurt rankings.
- Conflicting rules — If a URL matches both an Allow and Disallow rule, the more specific rule wins. If they are equal length, Allow wins (in Google's implementation).
- Forgetting to remove development disallows — Many WordPress sites have
Disallow: /added during development and it is never removed when the site goes live. - Using robots.txt for sensitive pages — If a page contains sensitive information, disallowing it is not enough — it could still be discovered. Use authentication or server-side access control instead.
Generating a robots.txt File
Our Robots.txt Generator lets you configure your crawling rules using a simple form — choose which directories to allow or disallow, enter your sitemap URL, and the tool generates the correct robots.txt content ready to copy. Pair it with our Sitemap Generator to create both key SEO files in minutes.
Generate your robots.txt — free
Build a correct robots.txt file using a simple form. No technical knowledge needed.Frequently Asked Questions
What does a robots.txt file do?
robots.txt tells search engine crawlers (Googlebot, Bingbot, etc.) which pages or sections of your site they should or should not crawl. It is placed at the root of your domain (/robots.txt). Crawlers read it before accessing any other page on your site.
Does robots.txt prevent pages from appearing in search results?
No — disallowing a URL in robots.txt prevents Googlebot from crawling it but does not prevent it from being indexed if other pages link to it. To prevent indexing, use a noindex meta tag on the page itself. Disallowing and noindexing together is needed for true removal.
What happens if I block Googlebot by mistake?
If you accidentally block Googlebot from your important pages (e.g., Disallow: /), Google cannot crawl your site. Your pages will eventually be deindexed. Use Google Search Console's robots.txt tester to verify your rules before deploying.
Do I need a robots.txt file if I want everything indexed?
No. If you want all pages crawled and indexed, you don't need a robots.txt file at all — crawlers will access everything by default. However, it is good practice to have one because it lets you specify the location of your sitemap, which helps search engines discover your pages.
← Back to Blog | Related tool: Robots.txt Generator