Piperic
state of AI report
AI ReportTools

The State of AI on the Open Web

How 19.6M live websites treat AI crawlers — who blocks ChatGPT, Claude, Perplexity & Google AI, by industry and country.

Source: Piperic index of 19,635,963 live, content-validated domains · last updated 2026-06-28

5.2%
block ≥1 AI crawler
of the live web
4.69%
block GPTBot
the most-blocked AI bot
10.0%
publish llms.txt
AI-readable index
0.2%
run an AI chatbot
live AI on-site
🤖 The most-blocked AI crawlers

Share of the live web that disallows each bot in robots.txt.

GPTBot
4.69%
CCBot
4.68%
ClaudeBot
4.45%
Amazonbot
4.44%
Google-Extended
4.41%
Bytespider
4.4%
Meta-ExternalAgent
4.28%
Applebot-Extended
4.22%
anthropic-ai
0.54%
ChatGPT-User
0.46%
PerplexityBot
0.41%
FacebookBot
0.39%
Omgilibot
0.32%
Claude-Web
0.29%
🏭 Industries blocking AI most
attractions
11.9%
video-gaming
9.6%
sensitive-topics
8.9%
shopping
7.7%
pop-culture
7.4%
🕊️ …and the most AI-open
law
3.1%
fine-art
3.2%
personal-celebrations-and-life-events
3.3%
home-and-garden
3.6%
events
3.6%
🌍 Countries blocking AI most
CO
4.8%
ES
4.2%
GB
4.1%
PL
4%
BR
3.8%
SE
3.8%
🌱 …and the most AI-open
DE
1.7%
AT
1.8%
FR
2.1%
IT
2.3%
CZ
2.6%
NL
2.8%
⚖️ AI-training policy signal

What sites declare about AI training (robots.txt + meta TDM + ToS).

allowed
93.8%
reserved
5.2%
prohibited
1.1%
920,764
block GPTBot
OpenAI's training crawler
5%
of them also block ChatGPT Search
→ accidentally invisible in ChatGPT
0.1%
publish ai.txt
0.7%
ToS prohibits AI training
🧱 What the web runs on

Top CMS

WordPress
6,913,853
Wix
677,552
Weebly
327,785
Shopify
291,855
Squarespace
228,697
Joomla
116,795
Drupal
72,441
Jimdo
66,375

Top e-commerce

WooCommerce
2,669,949
Shopify
284,928
Magento
186,933
Lightspeed
100,163
Wix Stores
92,696
Spree Commerce
77,334
Square Online
46,601
X-Cart
31,176

Top analytics

Google Analytics
5,621,763
Google Tag Manager
5,068,150
Sentry
721,779
Facebook Pixel
576,605
Microsoft Clarity
220,426
Cloudflare Analytics
191,625

Top CDN

Google Fonts
8,114,252
Cloudflare
6,267,504
cdnjs
1,545,463
jsDelivr
1,272,777
CloudFront
896,646
Vercel
778,659

Top payment providers

Stripe
84,834
PayPal
54,338
Square
35,672
Klarna
12,333
Razorpay
6,747
Checkout.com
6,079

Consent / CMP (EU)

Complianz
295,180
Cookiebot
268,024
CookieYes
131,294
OneTrust
79,059
Iubenda
49,370
Usercentrics
42,023
🗂️ The web by category, language & country

Top categories

business-and-finance
3,955,512
technology-and-computing
2,354,597
home-and-garden
1,059,263
attractions
981,584
hobbies-and-interests
780,539
style-and-fashion
714,945
real-estate
709,228
personal-finance
696,065
medical-health
695,001
healthy-living
565,957

Top languages

en
11,660,969
de
1,154,822
es
1,084,040
fr
966,066
ja
846,182
zh
692,843
pt
467,251
tr
327,269
nl
323,441
it
238,237

Top countries (by live domains)

DE
569,763
FR
197,813
NL
193,922
CH
109,802
BR
98,726
CO
85,149
SE
72,401
IT
65,092
PL
61,542
GB
59,477
CZ
44,701
AT
43,782
📇 How reachable the web is
38.4%
have an email
19.6%
have a phone
39.2%
have social links
0.3%
publish humans.txt
Check your own site — or any site
See which AI crawlers a domain blocks, live, and how it compares to its industry.

Run the AI Crawler Checker ↗

Methodology: signals read from each domain's homepage + robots.txt across Piperic's index of live, content-validated websites (HTTP 2xx, not parked). Percentages are of the live web. Updated weekly.