The Industrial-Scale Campaign to Extract Intelligence From Claude

Three Chinese AI companies ran what Anthropic calls an "industrial-scale" operation to train their models on Claude's outputs. The numbers, reported by The Verge, are specific: roughly 24,000 fraudulent accounts and more than 16 million exchanges with Claude. The companies named are DeepSeek, MiniMax, and Moonshot AI.

This practice is called model distillation. The technique itself has been technically possible since large language models became widely accessible. What makes this episode notable is the operational scale, the systematic coordination, and the fact that Anthropic chose to disclose it publicly.

What Distillation Actually Involves

Model distillation is a training technique where a smaller "student" model learns from the outputs of a larger "teacher" model. The student doesn't gain access to the teacher's weights or architecture. It learns by observing what the teacher produces and training to approximate those patterns.

The version Anthropic describes is crude by academic standards but effective in practice: create API accounts, submit large volumes of prompts, collect responses, use those responses as training data. Done at volume, this produces a model that replicates the teacher's reasoning style, phrasing habits, and problem-solving patterns. The result isn't Claude. But for a company trying to close a capability gap without matching Anthropic's compute investment, it's a shortcut with measurable payoff.

Why Chinese Labs Ran This

Access constraints explain most of it. Claude isn't officially available in China. Neither is ChatGPT. Developers working on frontier products in that environment face a structural problem: the most capable models for benchmarking and learning are the ones they're blocked from using directly through sanctioned channels.

The workaround is to build accounts in accessible jurisdictions, query at volume, and collect outputs. Twenty-four thousand accounts conducting over 16 million exchanges is coordinated, not opportunistic. That distinction matters.

DeepSeek is the most prominent name on the list. The company drew significant Western attention in early 2025 after releasing R1, a reasoning model that reportedly matched frontier performance at a fraction of the stated training cost. That efficiency claim was central to the attention it received. If distillation on Claude's outputs contributed to R1's capabilities, the comparison with Western frontier models becomes more complicated to interpret.

MiniMax and Moonshot AI are less familiar internationally but are serious, well-funded companies with real user bases, not research outfits running experiments. They operate in an environment where access to high-quality Western model outputs is a competitive input with few substitutes.

The IP Question Is Genuinely Unresolved

Anthropic's acceptable use policy explicitly prohibits using Claude's outputs to train competing models. What's described here is a clear violation of those terms. Whether it constitutes copyright infringement is a separate question with a less settled answer.

Copyright protects expression, not reasoning patterns or problem-solving approaches. A Claude response explaining how to structure a database query or analyze a business case doesn't fit neatly into the categories copyright law was designed to protect. Courts across multiple jurisdictions are working through how copyright applies to AI-generated content at all. OpenAI is currently in litigation over whether training on copyrighted text is infringement — so the legal framework has unresolved questions at every layer.

What Anthropic demonstrably has is detailed API logging. The account creation patterns, query volume, and timing were apparently anomalous enough to identify as a coordinated campaign. Twenty-four thousand accounts doesn't look like ordinary usage; it's operationally distinctive.

What This Changes for Model Providers

Every major AI company now has an adversarial monitoring problem. Distillation at scale is operationally practical — this episode confirms it. The question for each provider is whether they can detect and interrupt such campaigns before the extracted data becomes useful.

Anthropic's public disclosure serves several functions simultaneously. It establishes that the company monitors at this level of detail. It creates reputational pressure on the named companies. And it positions Claude as a target of extraction rather than solely a beneficiary of contested training decisions — Anthropic is itself in litigation over training data sourcing.

The underlying competitive picture is worth framing clearly. Chinese AI labs operate under structural constraints that Western labs don't: no direct access to frontier Western APIs, limited ability to benchmark against leading models directly, and intense pressure to ship competitive products. Those constraints create strong incentives for indirect extraction. The technical barrier, as long as API access is obtainable through third-party jurisdictions, is not high.

What Happens Next

More aggressive API monitoring and account verification are the obvious responses. Rate limiting, behavioral anomaly detection, and account pattern analysis all become more important as extraction campaigns become better documented. These measures raise the cost of running such an operation; they don't eliminate the incentive.

The named companies haven't responded substantively in public. DeepSeek's efficiency narrative, MiniMax's model quality claims, and Moonshot's product positioning all face a reputational complication that wasn't present before Anthropic's disclosure.

For context on where AI capabilities are heading more broadly, the ChatGPT Agent analysis from earlier this month covers how autonomous AI systems are expanding what users can actually do with frontier models. The distillation episode sits inside that same competitive race: labs extracting signal from each other because the output quality gap between top-tier models is closing faster than the compute and data gaps that produced it.

One thing that's clearly true: distillation is a standard ML technique. Prohibiting it through terms of service makes it a violation, but doesn't make it technically difficult. The line between legitimate API use, competitive benchmarking, and extraction campaigns is real. The scale described here puts it firmly on one side of that line. How many smaller campaigns have run undetected is, by definition, unknown.