Piperic turns the open web into structured, honest business intelligence — including a layer no other database publishes: public AI-policy signals — which sites block AI crawlers, and which signal AI-training allowed.
I'm Rácz-Akácosi Attila, an independent Central-European builder. For nearly two decades I've worked deep inside digital systems — reverse-engineering how they really behave, building predictive models, and applying machine learning to messy, real-world data long before it was fashionable.
One thread runs through all of it: digital responsibility — that data should be handled honestly and signals measured, not guessed. Piperic is where that meets the open web: no vanity metrics, no inflated database — just the live web, measured carefully and published as it actually is.
We crawl the open web at scale and enrich every site we keep: its category, technology stack, contact details, and AI policy (robots rules, llms.txt, terms-of-service signals). The crawler reads each homepage the way a careful analyst would, then we structure the result so it's searchable, segmentable and exportable.
Crucially, we publish only what's alive — HTTP 2xx, not parked, not a dead redirect. A smaller, honest dataset beats a bloated one every time, especially when you're building a list you'll actually contact.
Build targeted lead lists by CMS, e-commerce platform, category, country, language and public contact coverage.
Find companies running a specific stack — a CMS, a payment provider, an analytics tool — for pitches, migrations and partnerships.
Track public AI-crawler policy signals at web scale: who blocks AI crawlers, who publishes llms.txt/ai.txt, who exposes no detectable policy.
Clean, exportable website intelligence — live sites only, in CSV, JSON or XLSX.
Public AI-policy signals no other web database surfaces: which sites block AI crawlers, and which signal AI-training allowed (robots rules, llms.txt/ai.txt, ToS). A new axis for research, compliance and outreach.
Only live, validated sites. Parked domains, dead redirects and empty llms.txt files are filtered out by default — so the data you export is data you can use.
One-click self-service removal, nofollow on every outbound link, and a public crawler-info page. Public pages show contact availability with masked details; full contact exports are gated for business-contact use.
A bootstrapped, data-driven project from Central Europe. No investors steering the data — just a careful, transparent method.
Questions, a custom dataset, or just curious how a signal is measured? Write to us — we reply within one business day.
hello@piperic.com