Blog

The latest on technical enforcement of crawling preferences.

Feb 3, 2026 21 min read

The State of Defensive Data Poisoning in 2026: A Report

Comprehensive analysis of AI training data enforcement: robots.txt bypass data, tool effectiveness, legal developments, and the shift from signaling to enforcement.

state of data poisoning 2026AI crawler compliance dataPoison Fountain

Jun 4, 2026 18 min read

Scrapling and Crawlee: How Open-Source Scraping Tools Get Detected

A technical analysis of Scrapling and Crawlee, two popular open-source scraping frameworks, examining their anti-detection features and the behavioral signals that content-layer defenses can exploit.

ScraplingCrawleeopen source web scraping

Jun 3, 2026 9 min read

What Defensive Coordination Actually Looks Like in 2026

A year after Poison Fountain launched anonymously, no AI lab has acknowledged it, no publisher has named it, and no one has measured it. Compared against Anubis, AIPREF, and Cloudflare Pay Per Crawl, the contrast shows what real defensive coordination requires.

Poison Fountain one yeardefensive AI coordinationAnubis Anubis defense

Jun 2, 2026 10 min read

AI Crawler Compliance, Mid-2026: The Blocked-but-Cited Trap

Publishers that blocked AI crawlers via robots.txt lost 23.1% of monthly traffic on average, and got only weakly correlated reductions in AI citation. The 2026 data inverts the case for blocking-as-defense.

AI crawler compliancerobots.txt blocked but citedZhao Berman publishers

Jun 1, 2026 8 min read

Anubis at One Year: What Production Operators Are Actually Reporting

A year of public Anubis deployments yields concrete operator numbers, a Codeberg cautionary tale, and a project trajectory shift toward layered defenses. What the data says about proof-of-work anti-scraping.

Anubisproof-of-work anti-scrapingAnubis deployment data

May 26, 2026 28 min read

AI Poisoning Threat Models: Backdoors, RAG, and Supply Chain

Backdoor attacks, model degradation, and RAG poisoning explained. Technical analysis of who can attack, defense costs, and power dynamics in AI training data.

AI poisoning threat modelsbackdoor attacks AIRAG poisoning

May 19, 2026 10 min read

How Much Does It Cost to Scrape the Web at Scale?

Bulk residential proxy pricing, Web Unlocker tiers, and headless browser farms put real per-page scraping costs at $0.001-$0.005, not the widely-quoted $0.01. AI training-data licensing deals show why the economics keep working for scrapers.

web scraping costresidential proxy pricing 2026Bright Data pricing

May 12, 2026 8 min read

AIPREF After Toronto: What the IETF Decided in April

The IETF AIPREF working group reached consensus on AI training scope at its April 2026 Toronto interim, made progress on AI search wording, and deferred the contested AI input category. Status update on the standard.

AIPREFIETF AIPREFAI training preferences

May 5, 2026 16 min read

Where AI Training Data Actually Comes From in 2026

A canonical reference for the six-layer AI training-data stack: Common Crawl, lab crawlers, curated open datasets, licensed feeds, contractor pipelines, and synthetic data. With the comprehensive licensing-deal table, current numbers, and what the labs do not disclose.

AI training data sourcesCommon Crawl 2026AI licensing deals

Apr 28, 2026 10 min read

Defensive Data Poisoning: Ethics, Risks, and Alternatives

Analyzing ethical tradeoffs of defensive data poisoning: proportionality, collateral damage, and safer alternatives like proof-of-work and AIPREF standards.

defensive poisoning ethicsdata poisoning collateral damageAnubis proof-of-work

Apr 21, 2026 23 min read

Data Poisoning FAQ: Technical, Legal, and Policy Answers

Answers to common questions about data poisoning, web crawling, robots.txt, AIPREF, legal status, and enforcement mechanisms for AI training defense.

data poisoning FAQrobots.txt AI crawlersAIPREF explained

Apr 14, 2026 8 min read

What Is Data Poisoning in Machine Learning?

Data poisoning manipulates AI training data to alter model behavior. Learn how defensive tools like Nightshade protect content from unauthorized AI training.

data poisoningAI data poisoningmachine learning poisoning

Apr 7, 2026 15 min read

How AI Scraping Infrastructure Works: Proxies, Evasion, and Scale

Inside the technical infrastructure AI companies use to scrape the web: residential proxy networks, fingerprint emulation, CAPTCHA solving, and why traditional defenses fail.

AI web scrapingresidential proxy networksBright Data

Mar 30, 2026 11 min read

Who Owns the Residential Proxy Industry That Feeds AI Scraping in 2026

A follow-the-ownership map of the residential proxy networks behind AI scraping: the Lithuanian and Israeli conglomerates that sell consumer privacy VPNs and the scraping infrastructure that drains publishers, plus why this structure breaks per-crawler defense.

residential proxyAI scrapingOxylabs

Mar 16, 2026 13 min read

Who Pays the Proof-of-Work Tax: The Accessibility Cost of Anti-Scraping Walls

Proof-of-work anti-scraping like Anubis is not a flat fee. It is a regressive tax: at difficulty 5, about 2 seconds on a flagship laptop, up to 2 minutes on an old phone, and a hard wall for non-JS and screen-reader users, while AI scrapers pay near zero.

proof-of-work accessibilityproof-of-work anti-scraping accessibilityAnubis accessibility

Mar 2, 2026 10 min read

The AI Crawler Compliance Crisis: Who Plays by the Rules?

AI crawler robots.txt compliance dropped from 96.7% to 70% in one year. Analysis of which crawlers comply, what it costs publishers, and what comes next.

AI web crawlingrobots.txt complianceAI scraping

Mar 1, 2026 18 min read

Understanding AIPREF: The IETF Standard for AI Content Preferences

AIPREF extends robots.txt with standardized vocabulary for AI training preferences. How the IETF standard works, its syntax, and what it means for publishers.

AIPREFIETF AIPREFAI preferences standard

Feb 24, 2026 27 min read

Publisher Defenses Against AI Scraping: Cost Imposition vs Poisoning

Comparing defense strategies against AI scraping: proof-of-work systems impose costs, data poisoning degrades value. Who pays and what works for publishers.

AI scraping defenseAnubis proof-of-workpublisher AI defense

Feb 18, 2026 12 min read

How AI Data Laundering Uses Non-Profit Research to Shield Commercial Models

AI data laundering routes web-scraped content through academic and non-profit datasets so commercial labs can train on material they could not legally collect themselves. The mechanism, the LAION case, the 2025 Hamburg ruling, and why robots.txt cannot reach it.

AI data launderingLAION lawsuitKneschke v LAION

Feb 5, 2026 12 min read

Why VENOM Exists: From robots.txt to AI Data Enforcement

When robots.txt fails, enforcement mechanisms emerge. VENOM analyzes data poisoning, proof-of-work, and technical countermeasures for AI training governance.

AI data enforcementenforcement vs signalingrobots.txt compliance