TL;DR. The number one SEO gap. Your WAF, or Cloudflare ruleset is silently blocking or robots.txt is silently disencouraging GPTBot, CCBot, PerplexityBot, ClaudeBot and similar to reach your page. Test it via the Rankscale Page Audit V2 in seconds. Fix time: 1 hour. Skip this and the rest of this module does not matter.
The 4 crawlers to allow
These are the named user agents that matter for AI visibility right now:
| Engine | User agent string | Why it matters |
|---|---|---|
| OpenAI ChatGPT | GPTBot | Live web retrieval for ChatGPT Search |
| Common Crawl (training + retrieval) | CCBot | Feeds many LLMs, including Claude and open-source models |
| Perplexity | PerplexityBot | Live retrieval for every Perplexity answer |
| Anthropic Claude | ClaudeBot (also anthropic-ai) | Live retrieval for Claude web search |
Add more as new engines emerge (Mistral, DeepSeek). When in doubt, allow the crawler. Most teams lose AI visibility by over-blocking, not by allowing the wrong bot.
Diagnose in 3 steps (10 minutes)
Step 1: test bot access. Use Rankscale's Page Audit V2 to check if all bots are allowed. We check all of the bots of each AI Search Engine.
Step 2: Check WAF and CDN rules. Cloudflare, Fastly, and AWS WAF ship with "AI Scraper" or "AI Bot" block rulesets that override robots.txt. Log into your CDN console and search for rules targeting GPTBot, CCBot, or "AI scrapers." Disable them or add explicit allow rules for the four user agents above.
If your team enabled the Cloudflare "Block AI Scrapers and Crawlers" toggle to protect content: turn it off for the pages you want AI to cite. Leave it on for paywalled or gated content only.
Step 3: Read robots.txt. Open https://yourdomain.com/robots.txt. Look for any Disallow: / rule targeting one of the four user agents above. A common anti-pattern:
User-agent: GPTBot Disallow: /
This blocks ChatGPT entirely. Remove the Disallow, or scope it narrowly (e.g. Disallow: /admin/).
The "not indexed on Google" check
Google AI Overviews and AI Mode source directly from Google's index. If the page is crawled but not indexed, it will not appear in AI Overviews regardless of how good it is. Check Google Search Console → URL Inspection. If the status is anything other than "URL is on Google," fix the indexing issue (canonical tags, noindex directives, duplicate content) before moving on. This is clearly described in Google Search Console on how to fix this.
Fix patterns
- robots.txt too broad: replace blanket
Disallow: /with narrow paths (/admin/,/checkout/,/api/). - WAF catching legitimate bots: add explicit Allow rules for the four user agents above.
- Rate-limits too aggressive: whitelist bot user agents or increase request thresholds for known crawlers.
- IP-based blocking: AI crawlers come from cloud IP ranges. If you block cloud IPs, you block them. Move to user-agent-based rules.
Do this now:
Run Rankscale's Page Audit V2 to check if every bot can access your website. If any fails, fix robots.txt or WAF this morning. Retest within the hour.
Start improving your AI visibility today with Rankscale.
Get started