Rules of engagement for AI crawlers that sites can actually enforce.
AI crawlers extract value from public content without sending users back. Major AI companies scrape billions of pages to train AI systems that compete directly with content creators. The economics are inverted: content creators bear hosting and bandwidth costs while AI companies extract value without compensation or even attribution.
Publishers like The New York Times and CNN have responded by blocking crawlers like OpenAI's GPTBot and Common Crawl's CCBot. But blocking is binary and blunt. You can't say "yes for search indexing (like Google indexing pages for search results), no for training AI systems." The tools available today don't give sites real leverage.
The robots.txt file is a standard that lets sites declare which crawlers can access which parts of their site. But it's voluntary and can be ignored with no technical consequence. There's no way to express nuanced preferences like "search vs training." Standard legal language in terms of service doesn't change crawler behavior, and violations are hard to detect and harder to prove.
Existing tools fail in three ways:
Semiautonomous Systems builds infrastructure that lets sites set, enforce, and measure rules of engagement for AI crawlers. The goal is to rebalance power so creators and publishers have a say in how their content is used.
We believe that rules of engagement should be expressible, enforceable, and measurable. Sites should be able to declare nuanced preferences beyond simple "allow" or "disallow."
These preferences need technical controls at the network edge (at CDN or proxy layers, before requests reach your servers) that make violating rules costly and unattractive. Honest, authorized access should become easier and cheaper than cheating. And there must be instrumentation that tracks compliance, detects violations, and provides evidence for accountability.
This shifts the economics: instead of "scrape everything and ask forgiveness later," crawlers must respect declared preferences or face technical consequences. Sites gain leverage through technical enforcement, not just legal text.
From "please don't" to technical enforcement.
We ship infrastructure that makes rules of engagement real through standards, technical controls, measurement, and detection.
Our first product, VENOM, brings these ideas to individual content surfaces.
We're building this infrastructure with design partners: content sites, publishers, and platforms who want to experiment with rules of engagement and anti-scraping strategy. If you're dealing with unauthorized AI crawlers, want to enforce preferences that declare how your content can be used, or need better visibility into who's accessing your content, we'd like to hear from you.