About Piperic — clean intelligence from the live web

Two decades of taking systems apart — to understand them.

I'm Rácz-Akácosi Attila, an independent Central-European builder. For nearly two decades I've worked deep inside digital systems — reverse-engineering how they really behave, building predictive models, and applying machine learning to messy, real-world data long before it was fashionable.

One thread runs through all of it: digital responsibility — that data should be handled honestly and signals measured, not guessed. Piperic is where that meets the open web: no vanity metrics, no inflated database — just the live web, measured carefully and published as it actually is.

Rácz-Akácosi Attila

Founder · Piperic

2007

Reverse-engineering digital systems

2009

Digital-responsibility work

2017

First machine-learning projects

2018

Predictive analytics & AI

2024–26

AI-rights & web intelligence — Piperic

We map the living web — and only the living web.

We crawl the open web at scale and enrich every site we keep: its category, technology stack, contact details, and AI policy (robots rules, llms.txt, terms-of-service signals). The crawler reads each homepage the way a careful analyst would, then we structure the result so it's searchable, segmentable and exportable.

Crucially, we publish only what's alive — HTTP 2xx, not parked, not a dead redirect. A smaller, honest dataset beats a bloated one every time, especially when you're building a list you'll actually contact.

17.2M

live domains indexed

16.9M

categorized (IAB taxonomy)

5.0M

with a contact email

~2–3w

re-crawl cadence

From raw websites to usable segments.

🎯

Sales & outreach

Build targeted lead lists by CMS, e-commerce platform, category, country, language and public contact coverage.

🏢

Agencies

Find companies running a specific stack — a CMS, a payment provider, an analytics tool — for pitches, migrations and partnerships.

⚖️

AI, legal & research

Track public AI-crawler policy signals at web scale: who blocks AI crawlers, who publishes llms.txt/ai.txt, who exposes no detectable policy.

📦

Data buyers

Clean, exportable website intelligence — live sites only, in CSV, JSON or XLSX.

Principles, not promises.

🤖

AI-policy intelligence

Public AI-policy signals no other web database surfaces: which sites block AI crawlers, and which signal AI-training allowed (robots rules, llms.txt/ai.txt, ToS). A new axis for research, compliance and outreach.

💎

Quality over quantity

Only live, validated sites. Parked domains, dead redirects and empty llms.txt files are filtered out by default — so the data you export is data you can use.

🛡️

Built with respect

One-click self-service removal, nofollow on every outbound link, and a public crawler-info page. Public pages show contact availability with masked details; full contact exports are gated for business-contact use.

🌍

Independent & European

A bootstrapped, data-driven project from Central Europe. No investors steering the data — just a careful, transparent method.

A method you can check.

Public signals only. We read public homepages and public policy files (robots.txt, llms.txt, ai.txt, terms of service) — nothing behind a login.

Filtered by default. Parked domains, dead redirects and obvious junk are removed; we keep live, usable sites.

Classified, not guessed. Category (IAB taxonomy), technology stack, contact coverage and AI-policy signals — measured per site, not assumed.

Privacy-first. Public pages mask contact details; full contact exports are gated and built around business-contact use cases.

Owner control. Any site owner can request removal in one click — no account, no friction.

Clean intelligence from the live web.