Does robots.txt prevent pages from being indexed?

No — Disallow blocks crawling, not indexing. If another website links to a blocked URL, Google may still show that URL in search results with no description. To reliably remove a page from search results, add a noindex meta tag to the page itself. Never block a noindexed page in robots.txt, or Google cannot read the noindex instruction.

Where should I put my robots.txt file?

At the root of your domain — https://yourdomain.com/robots.txt. It must be exactly at this location. Subdomains each need their own separate robots.txt file. The file must be accessible without authentication.

What does Disallow: / mean?

It blocks crawlers from accessing the entire website. This is correct for staging servers or private sites you do not want indexed. For a live public website, use Disallow: with no path (or simply omit the Disallow line) to allow all crawling.

What is the Crawl-delay directive?

It tells crawlers how many seconds to wait between requests, which can reduce server load during heavy crawling. Important: Googlebot ignores the Crawl-delay directive. To control Googlebot crawl rate specifically, use the crawl rate settings in Google Search Console.

Should I include my sitemap URL in robots.txt?

Yes, this is best practice. Adding a Sitemap: line (e.g., Sitemap: https://example.com/sitemap.xml) at the bottom of your robots.txt helps all crawlers discover your sitemap, even if they found robots.txt through means other than the search console submission.

Can I have multiple User-agent blocks?

Yes. You can write separate rule sets for different bots. Each block starts with a User-agent: line followed by its rules. Use User-agent: * to set rules for all bots, and add specific bot names to override those rules for particular crawlers.

🤖

Robots.txt Generator

Build a robots.txt file with crawl rules and sitemap URL in seconds. Free, instant.

🔍 SEO Tools Free Browser-based

Tool

User-agent

Rules (Allow or Disallow paths)

Crawl delay

Sitemap URL (optional)

Generated robots.txt

What Is a robots.txt File?

A robots.txt file is a plain text file placed at the root of your website that tells search engine crawlers which pages or sections they are allowed or not allowed to access. When a crawler like Googlebot visits your site, the very first thing it does is fetch https://yourdomain.com/robots.txt and read the instructions before crawling anything else.

The file follows the Robots Exclusion Protocol, a standard supported by all major search engines including Google, Bing, and DuckDuckGo. It gives you control over how bots spend their time on your site — a concept called crawl budget. Blocking low-value pages (admin panels, duplicate search result pages, login pages) frees up crawl budget so Google spends more time on the pages you actually want indexed.

robots.txt Syntax Reference

Directive	Meaning	Example
User-agent	Which bot the rules apply to	`User-agent: *`
Disallow	Block crawling of a path	`Disallow: /admin/`
Allow	Explicitly allow a sub-path	`Allow: /public/`
Sitemap	Pointer to your XML sitemap	`Sitemap: https://example.com/sitemap.xml`
Crawl-delay	Seconds to wait between requests	`Crawl-delay: 10`

robots.txt Examples for Common Scenarios

Allow all bots to crawl everything (most websites):

User-agent: *
Disallow:

Block all bots from the entire site (e.g. staging server):

User-agent: *
Disallow: /

Block admin and login pages (WordPress example):

User-agent: *
Disallow: /wp-admin/
Disallow: /wp-login.php
Allow: /wp-admin/admin-ajax.php

Sitemap: https://example.com/sitemap.xml

Block a specific bot only:

User-agent: AhrefsBot
Disallow: /

Common Paths to Disallow

Paths you typically block from crawlers include: /admin/, /wp-admin/, /login/, /cart/, /checkout/, /search (to avoid duplicate content from query strings), and /private/. Only block paths that genuinely should not be indexed — over-blocking can hurt your SEO by preventing Google from reaching important pages.

robots.txt vs noindex Meta Tag

These two tools are often confused. They do different things:

robots.txt Disallow — stops Google from crawling a page. Google never loads the page. However, if another website links to that blocked URL, Google may still show it in search results (as a URL with no description), because it learned the URL exists from the external link.
noindex meta tag — tells Google to not index a page it has already crawled. Google loads the page, reads the tag, and removes it from search results. This is the reliable way to keep pages out of search results.

Critical rule: if you block a URL in robots.txt, Google cannot read the noindex tag on that page. Never disallow a URL you want noindexed — let Google crawl it so it can see the noindex instruction.

How to Test Your robots.txt

After uploading your robots.txt file, test it in Google Search Console: go to Settings > robots.txt to view the file Google has cached, or use the URL Inspection tool to check whether a specific page is blocked. You can also visit your live robots.txt directly in a browser at https://yourdomain.com/robots.txt to confirm the file is accessible.