Necessity or Habit: Why is China Siphoning Data from American AI?

| By:   Tamer Karam           |  Feb. 24, 2026

aisteal

Leading artificial intelligence companies like Anthropic and OpenAI are accusing Chinese firms—such as DeepSeek, Moonshot, and MiniMax—of stealing data by querying their advanced models, like Claude and ChatGPT, and using the outputs to train competing models. This practice directly violates their terms of service, which explicitly prohibit such usage.

It is important to note that American companies themselves face similar accusations. Numerous writers and publishers—including The New York Times and thousands of authors—have filed lawsuits against them for using copyrighted books and articles to train their models without prior permission or financial compensation. However, when it comes to China, the issue takes on a complex geopolitical dimension, as these practices are viewed as a national security threat and part of a broader technological cold war.

This Chinese practice stems primarily from a drive of necessity—much like the necessity that drove American companies to exploit the works of writers and publishers to build the foundational knowledge required to improve their early models.

In the frantic AI race, the fastest way to catch up is to rely on what already exists and has been tested. American models are readily available, excellently trained, and boast high reasoning capabilities. Relying on them saves Chinese companies significant time and money that would otherwise be spent compiling and refining massive training datasets from scratch, allowing them to achieve rapid results while keeping costs down.

For instance, Anthropic recently accused these Chinese firms of executing a data extraction attack by generating over 16 million queries using around 24,000 fake accounts. Imagine the sheer cost and immense time required if these millions of outputs had to be reviewed or written entirely by human effort.

Secondly, China's industrial environment has historically been accustomed to the concept of reverse engineering. This involves dismantling successful tech products to understand their algorithms or mechanics, and then rebuilding them locally at a lower cost to suit their own markets. This approach has been, and remains, part of their pragmatic strategy for catching up with the global tech industry and rapidly developing products.

Although China has already begun transitioning from a phase of imitation to true, pioneering innovation in many vital sectors—such as advanced robotics, battery technology, and electric vehicles—the "fast-follower" culture that leverages the efforts of others remains strongly entrenched, and is particularly evident in the software sector.

Ultimately, even though Chinese companies today possess the minds and technical capabilities to build training data from scratch, they often prefer the faster and cheaper route. This path appears to be a deeply rooted habit cultivated over decades, bolstered by the confidence that Western intellectual property laws cannot easily reach or penalize them. At the same time, it is an immediate response to an urgent need to save resources and compensate for their inability to access the advanced computing chips available to their American counterparts. Consequently, they find themselves compelled to do whatever it takes to stay competitive on the global stage.


Share