Point11
  • Demo
  • Pricing
  1. Home
  2. Learn
Accessibility

What Is robots.txt

robots.txt tells crawlers what they can and can't access on your site. Many brands accidentally use it to block the agents that drive modern product discovery.

The file that controls what every major agent can see on your site is a plain text file at your domain root. Most brands have never looked at it.

Your Site's Front Door for Agents

robots.txt is served at yourdomain.com/robots.txt and formally standardized in RFC 9309[1]. It uses four directives: User-agent, Disallow, Allow, and Sitemap[2]. Every well-behaved crawler checks this file before requesting anything else, making it the single most powerful access control mechanism on your site. It requires zero infrastructure to change. You can edit it in Notepad.

The Agent Crawlers You Need to Know

Each agent platform sends its own User-agent string, and you need to make deliberate decisions about each one. Blocking GPTBot does not block ChatGPT-User, and vice versa. Here are the ones that matter today:

  • GPTBot: OpenAI's training and retrieval crawler[3]. Blocking this removes your content from ChatGPT's knowledge base.
  • ChatGPT-User: The real-time browsing crawler that ChatGPT uses when a user asks it to look something up. Blocking GPTBot but allowing this means ChatGPT can still browse your site live.
  • ClaudeBot: Anthropic's crawler[4]. The same logic applies: blocking it means Claude cannot reference your content.
  • Google-Extended: Google's separate crawler for Gemini and AI Overviews, distinct from Googlebot, which handles organic search[5]. Blocking this lets you opt out of AI features without losing your search rankings.
  • PerplexityBot: Perplexity's retrieval crawler. Increasingly active as Perplexity grows its answer engine.
  • Applebot: Serves Apple Intelligence and Siri. Already crawling at scale.

Mistakes That Cost You Visibility

A blanket User-agent: * / Disallow: / block is the nuclear option, and it is more common than you would think. Brands deploy it during a site migration and forget to remove it afterward. Legacy rules written for bots that no longer exist can also inadvertently match new agent User-agents through wildcard patterns.

The other classic error is confusing Disallow with noindex. Disallow tells crawlers not to fetch a page, while noindex tells search engines not to index it. They solve different problems. Using the wrong one means the page either gets crawled when you do not want it to, or stays invisible when you do.

How to Get This Right

Open yourdomain.com/robots.txt right now and read it line by line. If you do not recognize every rule, investigate before changing anything. Write explicit User-agent blocks for each agent crawler you want to allow or restrict, and use specific path patterns rather than root-level blocks so you can permit product pages while restricting admin routes.

Test every change in the Google Search Console robots.txt Tester[6] before deploying. One misplaced wildcard can make your entire catalog invisible to agents overnight.

How Scanner Helps

Scanner checks whether major agent crawlers are blocked by your robots.txt, identifies the specific rule responsible, and flags accidental blocks that could be costing you visibility in agent-powered discovery.

Sources

  1. 1.RFC 9309: Robots Exclusion Protocol
  2. 2.Google: robots.txt
  3. 3.OpenAI: GPTBot
  4. 4.Anthropic: ClaudeBot
  5. 5.Google: Google-Extended
  6. 6.Google: robots.txt Tester

See how your site scores.

Run a free scan at point11.ai to check your What Is robots.txt and 40+ other metrics.

Scan Your Site

More from Learn

Accessibility

What Is Schema Markup

Schema markup is a shared vocabulary that lets you annotate your web content so search engines and agents can understand what it means, not just what it says.

Accessibility

Site Architecture for Agents

How the structure of your site (URLs, hierarchy, internal links, and rendering approach) determines whether agents can navigate and understand it at all.

Performance

Cumulative Layout Shift (CLS)

CLS measures how much a page's content moves around while it loads. A low score means a stable experience for both humans and agents.

Point11

Analytics

  • SignalYour share of voice.
  • ScannerSee how agents see.
  • BenchmarksCompetitive views.
  • JourneysLive agents on site.

Infrastructure

  • SiteOptimized for agents.
  • ChatYour data, your edge.
  • VoiceNavigate by voice.
  • AdsAgent powered campaigns.

Insights

  • Blog
  • Case Studies
  • Podcast
  • Learn
  • Benchmarks

Company

  • About
  • Careers
  • Contact
  • Partners

Industries

  • Automotive
  • Education
  • Energy
  • Financial Services
  • Government
  • Healthcare
  • Insurance
  • Legal
  • Manufacturing
  • Media
  • Real Estate
  • Retail
  • Technology
  • Travel

Demo

  • Site Platform
  • DemoShopRetail
  • DemoBankFinance
  • DemoGovGovernment

Pricing

  • Pricing
© 2026 Point11 · Patent Pending
© 2026 Point11 · Patent PendingLegalPrivacyTermsSystem Status
System Status

robots.txt Rules

Tap or hover any line to see what it does. Toggle to see crawler visibility.

Show effect
/robots.txt
1# Robots.txt for example.com
3User-agent: *
4Allow: /products/✓ visible
5Allow: /collections/✓ visible
6Allow: /pages/about✓ visible
7Disallow: /cart✗ blocked
8Disallow: /checkout✗ blocked
9Disallow: /admin/✗ blocked
10Disallow: /search?*✗ blocked
12User-agent: GPTBot
13Allow: /products/✓ visible
14Disallow: /✗ blocked
16Sitemap: https://example.com/sitemap.xml

Tap or hover a line to see its effect on crawlers

Legend

Allow — crawlable
Disallow — blocked
User-agent selector

Allow rules expose content to AI agents. Disallow keeps sensitive paths hidden.