Robots.txt Generator

How to Write a robots.txt File

📅 April 2026⏱ 7 min read✍️ ToolsBox

The robots.txt file is a small but powerful text file at the root of your website that tells search engine crawlers which pages to visit and which to skip. A correctly written robots.txt helps search engines use their crawl budget on your most important pages. An incorrectly written one can accidentally block your entire site from Google — a mistake that is surprisingly common and can devastate rankings.

What Is robots.txt?

robots.txt is a plain text file placed at https://yourdomain.com/robots.txt. When a search engine crawler visits your site, it checks this file first. The file uses the Robots Exclusion Protocol — a standard that most major crawlers respect.

Important limitations to understand:

robots.txt is a request, not a firewall. Malicious bots ignore it. It is only for compliant crawlers like Googlebot and Bingbot.
Disallowing a URL prevents crawling but not indexing. Google can still index a URL it has never crawled if other pages link to it.
robots.txt rules apply to crawling only. For indexing control, use the noindex meta tag.
The file must be in the root directory. A robots.txt in a subfolder is ignored by crawlers.

The Basic Syntax

A robots.txt file contains groups of directives called "records." Each record starts with a User-agent line and is followed by Allow or Disallow lines:

User-agent: *
Allow: /
Disallow: /admin/

Sitemap: https://yourdomain.com/sitemap.xml

User-agent: * — applies this rule to all crawlers. Replace * with a specific bot name like Googlebot to target only that crawler.
Allow: / — explicitly allows crawling of everything (this is actually the default, so it can be omitted).
Disallow: /admin/ — prevents crawling of the /admin/ directory and all URLs within it.
Sitemap: — tells crawlers where your XML sitemap is. Not technically part of the Robots Exclusion Protocol but supported by all major crawlers.

Common robots.txt Patterns

Allow everything (simplest possible file):

User-agent: *
Allow: /

Sitemap: https://yourdomain.com/sitemap.xml

Block specific directories:

User-agent: *
Disallow: /wp-admin/
Disallow: /cart/
Disallow: /checkout/
Disallow: /account/

Sitemap: https://yourdomain.com/sitemap.xml

Block all crawlers from everything (staging/dev sites):

User-agent: *
Disallow: /

Use this on development or staging servers to prevent them from being indexed. Never use it on your live production site.

Block specific crawlers (e.g., to save bandwidth from scrapers):

User-agent: AhrefsBot
Disallow: /

User-agent: SemrushBot
Disallow: /

User-agent: *
Allow: /

Critical Mistakes to Avoid

Disallow: / on a live site — This blocks all crawling of your entire site. Googlebot will stop crawling and eventually deindex all your pages.
Blocking CSS and JavaScript — Google needs to render your pages to evaluate them. Blocking assets in /assets/, /css/, or /js/ can prevent proper rendering and hurt rankings.
Conflicting rules — If a URL matches both an Allow and Disallow rule, the more specific rule wins. If they are equal length, Allow wins (in Google's implementation).
Forgetting to remove development disallows — Many WordPress sites have Disallow: / added during development and it is never removed when the site goes live.
Using robots.txt for sensitive pages — If a page contains sensitive information, disallowing it is not enough — it could still be discovered. Use authentication or server-side access control instead.

Generating a robots.txt File

Our Robots.txt Generator lets you configure your crawling rules using a simple form — choose which directories to allow or disallow, enter your sitemap URL, and the tool generates the correct robots.txt content ready to copy. Pair it with our Sitemap Generator to create both key SEO files in minutes.

Generate your robots.txt — free

Build a correct robots.txt file using a simple form. No technical knowledge needed.

Open Robots.txt Generator →

Frequently Asked Questions

What does a robots.txt file do?

robots.txt tells search engine crawlers (Googlebot, Bingbot, etc.) which pages or sections of your site they should or should not crawl. It is placed at the root of your domain (/robots.txt). Crawlers read it before accessing any other page on your site.

Does robots.txt prevent pages from appearing in search results?

No — disallowing a URL in robots.txt prevents Googlebot from crawling it but does not prevent it from being indexed if other pages link to it. To prevent indexing, use a noindex meta tag on the page itself. Disallowing and noindexing together is needed for true removal.

What happens if I block Googlebot by mistake?

If you accidentally block Googlebot from your important pages (e.g., Disallow: /), Google cannot crawl your site. Your pages will eventually be deindexed. Use Google Search Console's robots.txt tester to verify your rules before deploying.

Do I need a robots.txt file if I want everything indexed?

No. If you want all pages crawled and indexed, you don't need a robots.txt file at all — crawlers will access everything by default. However, it is good practice to have one because it lets you specify the location of your sitemap, which helps search engines discover your pages.

← Back to Blog | Related tool: Robots.txt Generator