The State of AI on the Open Web
How 19.6M live websites treat AI crawlers — who blocks ChatGPT, Claude, Perplexity & Google AI, by industry and country.
Source: Piperic index of 19,635,963 live, content-validated domains · last updated 2026-06-28
5.2%
block ≥1 AI crawler
of the live web
4.69%
block GPTBot
the most-blocked AI bot
10.0%
publish llms.txt
AI-readable index
0.2%
run an AI chatbot
live AI on-site
🤖 The most-blocked AI crawlers
Share of the live web that disallows each bot in robots.txt.
🏭 Industries blocking AI most
🕊️ …and the most AI-open
personal-celebrations-and-life-events
3.3%
🌍 Countries blocking AI most
⚖️ AI-training policy signal
What sites declare about AI training (robots.txt + meta TDM + ToS).
920,764
block GPTBot
OpenAI's training crawler
5%
of them also block ChatGPT Search
→ accidentally invisible in ChatGPT
0.7%
ToS prohibits AI training
🧱 What the web runs on
Top analytics
Google Analytics
5,621,763
Google Tag Manager
5,068,150
Cloudflare Analytics
191,625
🗂️ The web by category, language & country
Top categories
business-and-finance
3,955,512
technology-and-computing
2,354,597
hobbies-and-interests
780,539
Top countries (by live domains)
📇 How reachable the web is
Check your own site — or any site
See which AI crawlers a domain blocks, live, and how it compares to its industry.
Run the AI Crawler Checker ↗
Methodology: signals read from each domain's homepage + robots.txt across Piperic's index of live, content-validated websites (HTTP 2xx, not parked). Percentages are of the live web. Updated weekly.