Should businesses block all AI crawlers?

Usually no. Blocking everything can reduce AI search visibility for legitimate service, FAQ and educational pages. A better starting point is to separate Search, Agent and Training use, then decide what each content type should allow.

What is the difference between AI search and AI training?

AI search access helps systems crawl, index or retrieve content for discovery and answers. AI training access is about using content to train or fine-tune models. A business may want the first without automatically allowing the second.

Does this replace robots.txt?

No. Cloudflare controls are one implementation layer. Businesses still need clean robots.txt, sitemap entries, structured data, page quality and internal governance so crawler rules match the content strategy.

What pages should stay visible to AI search?

For most SMBs, public service pages, FAQs, case studies, location pages and practical blog posts should remain visible because they help buyers and answer engines understand the business accurately.

AI Crawler Controls: Search, Agents and Training Access

Cloudflare now separates AI crawler access into Search, Agent and Training controls. For Australian SMBs, the practical move is to treat crawler access as a visibility policy, not a single block-or-allow switch.

Why do AI crawler controls matter now?

Cloudflare announced new AI traffic controls on 1 July 2026. Instead of treating AI bots as one broad category, site owners can manage AI traffic by behaviour: Search, Agent and Training.

That distinction matters for GEO and SEO because visibility and content use are no longer the same decision. A service page, FAQ or buying guide may need to be available to search and answer engines. A paid resource, original dataset or ad-supported content page may need stronger controls around automated use and model training.

For business leaders, the question is no longer simply whether AI systems can read the site. The better question is which parts of the site should be searchable, which parts can be used by agents acting for users, and which parts should not be used for training or fine-tuning.

What did Cloudflare change?

Cloudflare's primary blog and product changelog confirm that customers can now manage AI crawler behaviour across three categories:

Search: crawling and indexing that can help content appear in search or answer contexts.
Agent: automated access where an AI agent acts on behalf of a user, such as fetching a page or interacting with a site.
Training: use of website content to train or fine-tune AI models.

Cloudflare's docs also describe three control choices: block on all pages, block only on pages with ads, or allow. The same documentation says multi-purpose crawlers that combine Search and Training behaviour can be affected by the Training setting.

The source-backed default change is specific: from 15 September 2026, new domains onboarding to Cloudflare will block Agent and Training bots by default on pages with ads, while Search remains allowed. Existing and new customers can still adjust their own settings.

How should SMBs think about Search, Agent and Training?

A practical crawler policy starts with intent.

Search access is usually about discoverability. Your homepage, service pages, case studies, FAQs and educational articles often need to be readable by search and answer systems if you want the business to be found in AI-assisted buying journeys.

Agent access is about action. If an AI assistant is fetching your booking page, reading support information or helping a customer compare services, that can be useful. But if it interacts with forms, gated flows, account areas or commercial workflows, the risk profile changes.

Training access is about reuse. Many businesses are comfortable with public pages being indexed, but less comfortable with original frameworks, pricing intelligence, paid content, client-sensitive examples or proprietary datasets being absorbed into model training without a commercial arrangement.

What should be open, limited or blocked?

Start with a simple policy table instead of a technical guess.

Keep searchable: homepage, services, public FAQs, comparison pages, educational blog posts, location pages and case studies written for discovery.
Review before agent access: forms, booking flows, quote requests, gated lead magnets, product configurators and any page that can trigger a workflow.
Limit training use: proprietary research, paid resources, original templates, internal knowledge bases, detailed pricing logic, client-sensitive examples and unique datasets.

This does not replace robots.txt, sitemap hygiene, schema markup or page quality. It adds a governance layer above them. GEO work now includes both making the right content legible and making access rules explicit.

What should you do this week?

1. Map your public content. List the pages that should help buyers understand your business in AI search: services, FAQs, case studies, articles and contact pathways.
2. Separate high-value content. Keep public discovery content apart from proprietary, paid or client-sensitive material so crawler rules can be clearer.
3. Decide by behaviour. For each content type, record whether Search, Agent and Training access should be allowed, limited or blocked.
4. Check the implementation layer. Review Cloudflare settings if you use Cloudflare, then align robots.txt, sitemap entries, structured data and page copy with the same policy.
5. Review after major changes. Re-check crawler policy when you add a new service line, paid content, lead form, client case study or automation workflow.

RxAI helps Australian businesses prepare websites for AI search without giving up control of valuable content. Start with our GEO and AI-ready website services or use the contact page to map a practical crawler policy for your site.

AI Crawler Controls: Separate Search, Agents and Training Access

Why do AI crawler controls matter now?

What did Cloudflare change?

How should SMBs think about Search, Agent and Training?

What should be open, limited or blocked?

What should you do this week?

Sources