Robots.txt Generator
Build a robots.txt file with crawl rules and sitemap URL in seconds. Free, instant.
What Is a robots.txt File?
A robots.txt file is a plain text file placed at the root of your website that tells search engine crawlers which pages or sections they are allowed or not allowed to access. When a crawler like Googlebot visits your site, the very first thing it does is fetch https://yourdomain.com/robots.txt and read the instructions before crawling anything else.
The file follows the Robots Exclusion Protocol, a standard supported by all major search engines including Google, Bing, and DuckDuckGo. It gives you control over how bots spend their time on your site — a concept called crawl budget. Blocking low-value pages (admin panels, duplicate search result pages, login pages) frees up crawl budget so Google spends more time on the pages you actually want indexed.
robots.txt Syntax Reference
| Directive | Meaning | Example |
|---|---|---|
| User-agent | Which bot the rules apply to | User-agent: * |
| Disallow | Block crawling of a path | Disallow: /admin/ |
| Allow | Explicitly allow a sub-path | Allow: /public/ |
| Sitemap | Pointer to your XML sitemap | Sitemap: https://example.com/sitemap.xml |
| Crawl-delay | Seconds to wait between requests | Crawl-delay: 10 |
robots.txt Examples for Common Scenarios
Allow all bots to crawl everything (most websites):
User-agent: *
Disallow:
Block all bots from the entire site (e.g. staging server):
User-agent: *
Disallow: /
Block admin and login pages (WordPress example):
User-agent: *
Disallow: /wp-admin/
Disallow: /wp-login.php
Allow: /wp-admin/admin-ajax.php
Sitemap: https://example.com/sitemap.xml
Block a specific bot only:
User-agent: AhrefsBot
Disallow: /
Common Paths to Disallow
Paths you typically block from crawlers include: /admin/, /wp-admin/, /login/, /cart/, /checkout/, /search (to avoid duplicate content from query strings), and /private/. Only block paths that genuinely should not be indexed — over-blocking can hurt your SEO by preventing Google from reaching important pages.
robots.txt vs noindex Meta Tag
These two tools are often confused. They do different things:
- robots.txt Disallow — stops Google from crawling a page. Google never loads the page. However, if another website links to that blocked URL, Google may still show it in search results (as a URL with no description), because it learned the URL exists from the external link.
- noindex meta tag — tells Google to not index a page it has already crawled. Google loads the page, reads the tag, and removes it from search results. This is the reliable way to keep pages out of search results.
Critical rule: if you block a URL in robots.txt, Google cannot read the noindex tag on that page. Never disallow a URL you want noindexed — let Google crawl it so it can see the noindex instruction.
How to Test Your robots.txt
After uploading your robots.txt file, test it in Google Search Console: go to Settings > robots.txt to view the file Google has cached, or use the URL Inspection tool to check whether a specific page is blocked. You can also visit your live robots.txt directly in a browser at https://yourdomain.com/robots.txt to confirm the file is accessible.