Robots.txt Generator

Generate, validate & block AI bots with robots.txt.

Default Settings

Default for all robots

Crawl-delay

Sitemap URL

🔍 Search Engine Bots

GoogleGooglebot

Google ImagesGooglebot-Image

Google OtherGoogleOther

BingBingbot

YahooSlurp

YandexYandexBot

BaiduBaiduspider

DuckDuckGoDuckDuckBot

AppleApplebot

NaverYeti

🤖 AI Training Crawlers

Blocking these prevents your content from being used to train AI models.

GPTBotOpenAI

ClaudeBotAnthropic

Google AI TrainingGoogle (Gemini)

Common CrawlCommon Crawl

Meta AIMeta (Llama)

BytespiderByteDance (TikTok)

CohereCohere

DeepSeekDeepSeek

Apple AI TrainingApple Intelligence

Amazon AIAmazon

DiffbotDiffbot

AI2BotAllen Institute

Huawei AIHuawei

img2datasetOpen-source

ImagesiftBotThe Hive

🔎 AI Search / Retrieval Crawlers

Blocking these means your content won't appear in AI search results.

OpenAI SearchOpenAI

Claude SearchAnthropic

PerplexityPerplexity AI

DuckAssistDuckDuckGo

iAskiAsk.AI

LinerLiner AI

MistralMistral AI

👤 User-Triggered / Agentic Crawlers

ℹ️ User-triggered crawlers may not fully respect robots.txt. These bots operate when a real user requests content through an AI interface. For stronger protection, use server-level blocking or Cloudflare's AI bot management.

ChatGPT BrowseOpenAI

Perplexity UserPerplexity

Meta FetcherMeta

Claude UserAnthropic

GrokxAI

📁 Restricted Directories

robots.txt Preview

robots.txt153 chars

# Generated by UtilHub.io — Free Robots.txt Generator
# https://utilhub.io/seo-tools/robots-txt-generator
# Generated: 2026-05-28

User-agent: *
Allow: /

🔒 All generation happens in your browser — nothing is sent to any server

What Is a Robots.txt File and Why Every Website Needs One

A robots.txt file is a plain text file placed in a website's root directory that tells search engine crawlers and bots which pages they can and cannot access. For example, if your site is example.com, the file must be accessible at example.com/robots.txt. This file follows the Robots Exclusion Protocol (REP), which was formalized as an official internet standard in September 2022 under RFC 9309.

When a search engine bot like Googlebot visits a website, the first thing it does is check for a robots.txt file. If the file exists, the bot reads the instructions before crawling any pages. If no file is found, the bot assumes it can crawl everything on the site.

Having a well-configured robots.txt file serves several important purposes. First, it helps manage your crawl budget — the number of pages search engines will crawl on your site during a given visit. Google allocates a limited crawl budget to every website, and if bots waste time crawling admin panels, internal search results, or duplicate pages, your important content may take longer to get indexed. Second, it prevents non-essential or sensitive areas from appearing in search results, such as staging environments, shopping carts, user account pages, and admin dashboards.

It is important to understand that robots.txt is not a security mechanism. The file is publicly readable — anyone can view it at yoursite.com/robots.txt. Malicious bots and scrapers routinely ignore robots.txt instructions. For truly private content, use proper authentication, server-side access controls, password protection, or IP whitelisting.

Our robots.txt generator runs entirely in your browser. Nothing is uploaded to our servers, and your configuration data never leaves your device. This privacy-first approach means you can safely generate robots.txt files for any project, including internal or confidential websites.

Robots.txt Syntax — Every Directive Explained With Examples

The robots.txt file consists of one or more groups of rules, each beginning with a User-agent line that specifies which bot the rules apply to. Under RFC 9309, there are three standardized directives plus two widely-supported extensions.

User-agent specifies which crawler the rules apply to. Use an asterisk (*) to target all bots, or specify individual crawlers by name.

Disallow tells the specified bot not to crawl a path. Each Disallow directive applies to one path. An empty Disallow (Disallow:) means "allow everything."

Allow overrides a Disallow rule for a more specific path. This is useful when you want to block a directory but allow access to specific files within it.

Sitemap (extension, not in RFC 9309 but supported by all major search engines) tells crawlers where to find your XML sitemap.

Crawl-delay (extension) tells bots to wait a specified number of seconds between requests. Google does not support this directive — use Google Search Console's crawl rate settings instead. Bing and Yandex do respect it.

Wildcard patterns are supported by Google, Bing, and Yandex. The asterisk (*) matches zero or more characters, and the dollar sign ($) matches the end of the URL.

Important conflict resolution rule from RFC 9309: When multiple rules match a URL, the most specific (longest path) rule wins, regardless of whether it is an Allow or Disallow.

How to Block AI Bots With Robots.txt in 2026

The rise of AI crawlers has fundamentally changed how website owners think about robots.txt. As of March 2026, over 5.6 million websites block OpenAI's GPTBot, and 5.8 million block Anthropic's ClaudeBot. Among top news sites, 79% now block at least one AI training bot. AI bot traffic quadrupled during the first half of 2025, and HUMAN Security reported a 6,900% year-over-year increase in verified AI agent traffic.

Understanding the three types of AI crawlers is essential for making informed blocking decisions:

AI Training Crawlers collect content to train large language models. Blocking these prevents your content from being used in model training but does not affect your visibility in AI search results. The major training crawlers include GPTBot (OpenAI), ClaudeBot (Anthropic), Google-Extended (Google Gemini training), CCBot (Common Crawl), Meta-ExternalAgent (Meta/Llama), Bytespider (ByteDance/TikTok), cohere-ai (Cohere), DeepSeekBot (DeepSeek), and Applebot-Extended (Apple Intelligence).

AI Search Crawlers index content to provide answers and citations in AI search products. Blocking these means your content will not appear in AI-powered search results like ChatGPT Search or Perplexity. The main search crawlers are OAI-SearchBot (ChatGPT Search), Claude-SearchBot (Claude search), PerplexityBot (Perplexity), and DuckAssistBot (DuckDuckGo AI Answers).

User-Triggered Crawlers fetch content in real-time when a user asks an AI to browse a specific page. These include ChatGPT-User, Perplexity-User, Meta-ExternalFetcher, and Claude-User. These crawlers may not fully respect robots.txt because they are acting on behalf of a human user.

A critical point: blocking AI training crawlers does NOT affect your Google Search rankings. Google-Extended controls only whether Google uses your content for Gemini AI training — it is completely separate from Googlebot, which handles search indexing.

It is worth noting that as of early 2026, approximately 13% of AI bot requests ignore robots.txt entirely — up from 3.3% in late 2024. Some newer agentic browsers use standard Chrome user-agent strings, making them indistinguishable from normal browser traffic through robots.txt alone.

Common Robots.txt Mistakes That Hurt Your SEO

Mistake 1: Accidental full site block. Adding Disallow: / under User-agent: * prevents all search engines from crawling your entire site. This can completely de-index your website within weeks.

Mistake 2: Blocking CSS and JavaScript files. Google needs to render your pages exactly as users see them. Blocking CSS and JS files prevents proper rendering, which severely hurts your rankings.

Mistake 3: Case sensitivity errors. Paths in robots.txt are case-sensitive under RFC 9309. Blocking /Admin/ will not block /admin/.

Mistake 4: Missing User-agent declaration. Every Disallow or Allow rule must be preceded by a User-agent line. Rules without a User-agent are invalid.

Mistake 5: Using robots.txt for security. Robots.txt does not hide content — it actually reveals which directories you consider sensitive.

Mistake 6: Forgetting the Sitemap directive. Including your sitemap URL in robots.txt helps search engines discover your content faster.

Mistake 7: File size exceeding 500 KiB. Under RFC 9309, content beyond 500 KiB may be ignored.

How to Upload Robots.txt to Your Website

After generating your robots.txt file, it must be placed in your website's root directory at https://yoursite.com/robots.txt.

For traditional hosting with FTP/SFTP access, upload the file to your public_html, www, or htdocs root folder using an FTP client like FileZilla.

For WordPress sites, plugins like Yoast SEO (under SEO → Tools → File Editor) and Rank Math (under General Settings → Edit robots.txt) provide dashboard-based editing without FTP access.

For Shopify, edit the robots.txt.liquid template. Since March 2025, Shopify supports domain-specific rules.

For Angular sites (including Angular 21 with SSR), add robots.txt to src/robots.txt and include it in the assets array of angular.json.

After uploading, verify the file by visiting yoursite.com/robots.txt in your browser. Then test it using Google Search Console's Robots.txt Tester.

The Future of Robots.txt — Emerging Standards in 2026

The IETF AI Preferences Working Group (AIPREF), chartered in January 2025, is developing a formal mechanism for expressing AI usage preferences. The proposed draft would add a Content-Usage directive to robots.txt, allowing publishers to specify granular permissions for AI training, search indexing, and content generation.

Really Simple Licensing (RSL), launched September 2025, provides XML-based licensing terms for AI content use. RSL is supported by Reddit, Yahoo, Medium, Quora, WebMD, wikiHow, and others.

The llms.txt file, proposed in 2024, places a markdown file at /llms.txt that curates key content for large language models. About 10% of websites have adopted llms.txt, but as of March 2026, none of the three major AI providers have implemented native support for reading it.

For now, robots.txt remains the primary and most widely-respected mechanism for controlling AI crawler access. Our generator supports all current standards and will be updated as new protocols mature.

Free Robots.txt Generator — Create, Validate & Block AI Bots

Generate robots.txt files with 25+ AI bot controls, 12 CMS presets, and real-time validation. UtilHub's Robots.txt Generator gives you complete control over how search engine crawlers and AI bots interact with your website — all from a visual interface that runs entirely in your browser. Nothing is uploaded to our servers, and your configuration data never leaves your device.

As of March 2026, over 5.6 million websites block OpenAI's GPTBot and 5.8 million block Anthropic's ClaudeBot. AI bot traffic quadrupled during 2025, with HUMAN Security reporting a 6,900% year-over-year increase in verified AI agent traffic. Our generator includes all current AI crawlers — training bots, search bots, and user-triggered bots — organized by purpose so you can make informed blocking decisions.

How to use Robots.txt Generator

Choose your setup method — Select "Create from Scratch" to build custom rules, "CMS Templates" for WordPress, Shopify, Joomla, and 9 other ready-made configurations, or "Validate & Test" to check an existing robots.txt file. CMS templates include pre-configured rules tested with 12 popular platforms.
Configure bot access rules — Set allow or disallow rules for search engine bots (Googlebot, Bingbot, Yandex) and choose from 25+ AI crawlers organized by purpose: training bots (GPTBot, ClaudeBot), search bots (OAI-SearchBot, PerplexityBot), and user-triggered bots. Use quick-action buttons like "Block All AI Training" or "SEO Recommended" for instant configuration. Add restricted directories like /admin/, /search/, or /cart/.
Add sitemap and review output — Enter your XML sitemap URL so search engines can discover all your pages. Set crawl-delay if needed for Bing and Yandex. Review the syntax-highlighted output in real-time as you make changes — every edit updates the generated file instantly.
Validate, download, and upload — Use the built-in validator to check for errors before deploying. Copy the generated robots.txt to your clipboard or download it as a file. Upload it to your website's root directory — it must be accessible at https://yoursite.com/robots.txt. Test using Google Search Console's Robots.txt Tester.

Features

25+ AI Bot Controls — Block or allow AI training crawlers (GPTBot, ClaudeBot, Google-Extended), AI search crawlers (OAI-SearchBot, PerplexityBot), and user-triggered bots, organized by purpose with clear explanations.
12 CMS Presets — Ready-made templates for WordPress, WooCommerce, Shopify, Joomla, Drupal, Magento, Next.js, Laravel, Angular, Wix, Squarespace, and Webflow with platform-specific best practices.
Real-Time Validation — Built-in syntax checker catches errors, warnings, and RFC 9309 compliance issues before you deploy. Color-coded results make problems obvious.
URL Path Tester — Test any URL path against your robots.txt rules to verify whether it is allowed or blocked, with the matching rule highlighted.
Syntax-Highlighted Preview — Live preview with color-coded directives (User-agent in green, Disallow in red, Allow in blue) updates instantly as you configure rules.
Quick Action Presets — One-click configurations including "Block All AI Bots", "SEO Recommended", "Allow Only Search Engines", and "Block Everything" for instant setup.

Frequently Asked Questions

What is a robots.txt file and why do I need one?

A robots.txt file is a plain text file placed in your website's root directory that tells search engine crawlers and bots which pages they can and cannot access. It follows the Robots Exclusion Protocol, formalized as RFC 9309 in September 2022. While search engines will crawl your site without one, a robots.txt file helps manage crawl budget — the limited number of pages Google crawls per visit. By blocking non-essential pages like admin panels, internal search results, and duplicate content, you ensure crawlers focus on your important pages, improving indexing speed and SEO performance.

How do I block AI crawlers like ChatGPT and Claude from scraping my content?

Add specific User-agent rules for each AI bot you want to block. As of March 2026, the major AI crawlers are: GPTBot (OpenAI/ChatGPT training), OAI-SearchBot (ChatGPT Search indexing), ClaudeBot (Anthropic/Claude training), Google-Extended (Gemini AI training), CCBot (Common Crawl, used for AI datasets), PerplexityBot (Perplexity AI), Meta-ExternalAgent (Meta/Llama), Bytespider (ByteDance/TikTok AI), and DeepSeekBot (DeepSeek). For each training bot, add "User-agent: [bot-name]" followed by "Disallow: /" to block your entire site. Over 5.6 million websites now block GPTBot. Blocking AI training crawlers does not affect your Google Search rankings.

What is the difference between Disallow and Noindex?

Disallow in robots.txt prevents crawlers from accessing a page, but it does not guarantee the page won't appear in search results — Google can still index a URL based on external links without crawling it. The noindex meta tag tells search engines not to show a page in results, but crawlers must first access the page to read the directive. For complete removal from search results, do not block the page with robots.txt (so crawlers can read the noindex tag) and add the noindex directive to the page itself. Google officially deprecated the unofficial robots.txt noindex directive on September 1, 2019.

What is crawl-delay and should I use it?

Crawl-delay tells bots to wait a specified number of seconds between requests to your server. Google does not support crawl-delay — you must use Google Search Console's crawl rate settings instead. Bing respects values of 1-30 seconds, and Yandex also supports it. For most websites, crawl-delay is unnecessary and can actually slow down indexing. Only use it if your server has limited resources and bot traffic causes performance issues — typically shared hosting with high-traffic sites.

Where do I upload the robots.txt file?

The robots.txt file must be placed in your website's root directory, accessible at https://yoursite.com/robots.txt. For traditional hosting, upload via FTP/SFTP to the public_html folder. In WordPress, use Yoast SEO (Tools → File Editor) or Rank Math (General Settings → Edit robots.txt). For Shopify, edit the robots.txt.liquid template in your theme code. On Cloudflare Pages, place it in your public/ source directory. For Angular 21 with SSR, add it to the assets array in angular.json. After uploading, verify by visiting yoursite.com/robots.txt in your browser, then test with Google Search Console.

Can robots.txt protect private or sensitive content?

No. Robots.txt is not a security mechanism — it is publicly readable by anyone at yoursite.com/robots.txt, which actually reveals which directories you consider sensitive. Malicious bots ignore robots.txt entirely, and as of 2026, about 13% of AI bot requests also ignore it. For private content, use proper authentication, server-side access controls (.htaccess, firewall rules), password protection, or IP whitelisting. Robots.txt should only be used for crawl management — controlling how legitimate bots interact with your public content.

Does blocking AI bots affect my Google Search rankings?

No. Blocking AI training crawlers like GPTBot, ClaudeBot, CCBot, or Google-Extended has no effect on your Google Search rankings. Googlebot (the crawler responsible for search indexing) is completely separate from Google-Extended (which controls AI training data use). However, blocking AI search crawlers like OAI-SearchBot or PerplexityBot means your content will not appear in those AI search products, which may reduce your overall traffic from AI-powered search.

What is RFC 9309 and why does it matter for robots.txt?

RFC 9309 is the formal Internet Engineering Task Force (IETF) standard for robots.txt, published in September 2022. Before RFC 9309, robots.txt was a de facto convention with no official specification, leading to inconsistent implementations. The standard formalizes User-agent, Disallow, and Allow as the only recognized directives, standardizes wildcard patterns (* and $), codifies error handling (4xx means full access, 5xx means full disallow), sets a minimum file size of 500 KiB, and specifies that the most specific (longest path) rule wins when conflicts occur. Crawl-delay, Sitemap, Noindex, Host, and Clean-param are explicitly excluded from the standard, though Sitemap remains widely supported as an extension.