Blog

The latest on technical enforcement of crawling preferences.

18 min read

Scrapling and Crawlee: How Open-Source Scraping Tools Get Detected

A technical analysis of Scrapling and Crawlee, two popular open-source scraping frameworks, examining their anti-detection features and the behavioral signals that content-layer defenses can exploit.

ScraplingCrawleeopen source web scraping
15 min read

How AI Scraping Infrastructure Works: Proxies, Evasion, and Scale

Inside the technical infrastructure AI companies use to scrape the web: residential proxy networks, fingerprint emulation, CAPTCHA solving, and why traditional defenses fail.

AI web scrapingresidential proxy networksBright Data
10 min read

The AI Crawler Compliance Crisis: Who Plays by the Rules?

AI crawler robots.txt compliance dropped from 96.7% to 70% in one year. Analysis of which crawlers comply, what it costs publishers, and what comes next.

AI web crawlingrobots.txt complianceAI scraping
17 min read

Understanding AIPREF: The IETF Standard for AI Content Preferences

AIPREF extends robots.txt with standardized vocabulary for AI training preferences. How the IETF standard works, its syntax, and what it means for publishers.

AIPREFIETF AIPREFAI preferences standard
23 min read

Data Poisoning FAQ: Technical, Legal, and Policy Answers

Answers to common questions about data poisoning, web crawling, robots.txt, AIPREF, legal status, and enforcement mechanisms for AI training defense.

data poisoning FAQrobots.txt AI crawlersAIPREF explained
27 min read

Publisher Defenses Against AI Scraping: Cost Imposition vs Poisoning

Comparing defense strategies against AI scraping: proof-of-work systems impose costs, data poisoning degrades value. Who pays and what works for publishers.

AI scraping defenseAnubis proof-of-workpublisher AI defense
27 min read

AI Poisoning Threat Models: Backdoors, RAG, and Supply Chain

Backdoor attacks, model degradation, and RAG poisoning explained. Technical analysis of who can attack, defense costs, and power dynamics in AI training data.

AI poisoning threat modelsbackdoor attacks AIRAG poisoning
10 min read

Defensive Data Poisoning: Ethics, Risks, and Alternatives

Analyzing ethical tradeoffs of defensive data poisoning: proportionality, collateral damage, and safer alternatives like proof-of-work and AIPREF standards.

defensive poisoning ethicsdata poisoning collateral damageAnubis proof-of-work
7 min read

What Is Data Poisoning in Machine Learning?

Data poisoning manipulates AI training data to alter model behavior. Learn how defensive tools like Nightshade protect content from unauthorized AI training.

data poisoningAI data poisoningmachine learning poisoning
12 min read

Why VENOM Exists: From robots.txt to AI Data Enforcement

When robots.txt fails, enforcement mechanisms emerge. VENOM analyzes data poisoning, proof-of-work, and technical countermeasures for AI training governance.

AI data enforcementenforcement vs signalingrobots.txt compliance