Build a robots.txt file for your website. Control which crawlers can access which parts of your site.
Robots.txt is a plain text file placed at the root of your website (e.g., yoursite.com/robots.txt) that instructs web crawlers which pages or directories they are and aren't allowed to access. It follows the Robots Exclusion Protocol — an informal standard followed by all major search engines and most well-behaved bots.
The file uses simple directives: User-agent specifies which bot the rule applies to (* means all), Allow permits access to a path, Disallow blocks it, and Sitemap points crawlers to your sitemap. Robots.txt is a request, not a security measure — malicious bots ignore it entirely. Never use robots.txt to hide sensitive pages; use proper authentication instead. Common use cases: blocking crawlers from /admin, /checkout, staging environments, or duplicate URL patterns with parameters. Since 2023, you can also use robots.txt to block AI training scrapers like GPTBot, CCBot, and Google-Extended.
Robots.txt tells web crawlers which pages they are allowed or not allowed to visit. It must be placed at the root of your domain (yoursite.com/robots.txt). All major search engines — Google, Bing, Yahoo — follow its instructions for well-behaved crawling.
Disallowing a page in robots.txt prevents crawling but does NOT prevent the page from appearing in search results if other sites link to it. To completely de-index a page, use the meta robots noindex tag instead, which requires the page to be crawlable.
Add separate Disallow rules for known AI training bots: User-agent: GPTBot (OpenAI), User-agent: CCBot (Common Crawl), User-agent: Google-Extended (Google AI training), User-agent: anthropic-ai (Claude). Set Disallow: / under each. Use the "Block AI bots" preset in this generator.
Yes — incorrectly blocking important pages in robots.txt is one of the most common SEO mistakes. Blocked pages cannot be crawled or indexed. Always verify your robots.txt is not accidentally blocking your key pages using Google Search Console's URL Inspection tool.