Piperic
live · 17,180,724 domains
Explore Open the database →
About Piperic

Clean intelligence from the live web.

Piperic turns the open web into structured, honest business intelligence — including a layer no other database publishes: public AI-policy signals — which sites block AI crawlers, and which signal AI-training allowed.

Since 2007, deep inside digital systems

Two decades of taking systems apart — to understand them.

I'm Rácz-Akácosi Attila, an independent Central-European builder. For nearly two decades I've worked deep inside digital systems — reverse-engineering how they really behave, building predictive models, and applying machine learning to messy, real-world data long before it was fashionable.

One thread runs through all of it: digital responsibility — that data should be handled honestly and signals measured, not guessed. Piperic is where that meets the open web: no vanity metrics, no inflated database — just the live web, measured carefully and published as it actually is.

A
Rácz-Akácosi Attila
Founder · Piperic
2007
Reverse-engineering digital systems
2009
Digital-responsibility work
2017
First machine-learning projects
2018
Predictive analytics & AI
2024–26
AI-rights & web intelligence — Piperic
What we do

We map the living web — and only the living web.

We crawl the open web at scale and enrich every site we keep: its category, technology stack, contact details, and AI policy (robots rules, llms.txt, terms-of-service signals). The crawler reads each homepage the way a careful analyst would, then we structure the result so it's searchable, segmentable and exportable.

Crucially, we publish only what's alive — HTTP 2xx, not parked, not a dead redirect. A smaller, honest dataset beats a bloated one every time, especially when you're building a list you'll actually contact.

17.2M
live domains indexed
16.9M
categorized (IAB taxonomy)
5.0M
with a contact email
~2–3w
re-crawl cadence
Who it's for

From raw websites to usable segments.

🎯

Sales & outreach

Build targeted lead lists by CMS, e-commerce platform, category, country, language and public contact coverage.

🏢

Agencies

Find companies running a specific stack — a CMS, a payment provider, an analytics tool — for pitches, migrations and partnerships.

⚖️

AI, legal & research

Track public AI-crawler policy signals at web scale: who blocks AI crawlers, who publishes llms.txt/ai.txt, who exposes no detectable policy.

📦

Data buyers

Clean, exportable website intelligence — live sites only, in CSV, JSON or XLSX.

What makes it different

Principles, not promises.

🤖

AI-policy intelligence

Public AI-policy signals no other web database surfaces: which sites block AI crawlers, and which signal AI-training allowed (robots rules, llms.txt/ai.txt, ToS). A new axis for research, compliance and outreach.

💎

Quality over quantity

Only live, validated sites. Parked domains, dead redirects and empty llms.txt files are filtered out by default — so the data you export is data you can use.

🛡️

Built with respect

One-click self-service removal, nofollow on every outbound link, and a public crawler-info page. Public pages show contact availability with masked details; full contact exports are gated for business-contact use.

🌍

Independent & European

A bootstrapped, data-driven project from Central Europe. No investors steering the data — just a careful, transparent method.

How we do it

A method you can check.

01
Public signals only. We read public homepages and public policy files (robots.txt, llms.txt, ai.txt, terms of service) — nothing behind a login.
02
Filtered by default. Parked domains, dead redirects and obvious junk are removed; we keep live, usable sites.
03
Classified, not guessed. Category (IAB taxonomy), technology stack, contact coverage and AI-policy signals — measured per site, not assumed.
04
Privacy-first. Public pages mask contact details; full contact exports are gated and built around business-contact use cases.
05
Owner control. Any site owner can request removal in one click — no account, no friction.

Let's talk.

Questions, a custom dataset, or just curious how a signal is measured? Write to us — we reply within one business day.

hello@piperic.com