The Anthropic AI Cyberattack: Dissecting the 90% Automation Claim
- Ethan Carter

- 2 days ago
- 7 min read

The headlines were undeniably alarming. Anthropic recently announced that a Chinese state-backed espionage group had utilized its Claude model to automate the vast majority of a cyber campaign against dozens of organizations.On the surface, this confirms a long-held fear in the security industry: that large language models (LLMs) are evolving from helpful coding assistants into autonomous agents capable of waging digital warfare.
However, beneath the press releases and the scary percentages lies a much more complex and contested reality. While company engineers frame this as a "first-of-its-kind" autonomous operation, cybersecurity experts and industry veterans are pushing back. They argue that the narrative of a sophisticated Anthropic AI cyberattack might be less about technological breakthroughs in hacking and more about redefining what "automation" actually means.
The skepticism isn't just about whether the attack happened, but about how it was characterized. From accusations of a marketing stunt to basic questions about operational security (OpSec), the disconnect between the official report and the community's reaction highlights a growing tension in how we talk about AI threats.
The Official Narrative: An Autonomous Anthropic AI Cyberattack?

According to the statement released by company representatives on November 13, engineers detected a cluster of misuse that they eventually attributed to operators linked to Chinese cyber espionage.The claim is specific and bold: the attackers allegedly used the Claude Code model to plan and execute roughly 80-90% of the attack chain.
The targets spanned technology, finance, and government sectors across 30 organizations.Anthropic describes an operation where the human element was reduced to high-level decision-making—picking the target and deciding when to pull the data. The rest—reconnaissance, vulnerability analysis, exploit generation, and even data exfiltration—was purportedly handled by the model.
To achieve this, the attackers reportedly used a technique known as task decomposition. This involves breaking down a malicious objective into smaller, benign-sounding requests that bypass the model's safety guardrails.Instead of asking the AI to "hack this server," an operator might ask it to "run a connectivity test" or "debug a script," piecing the results together to achieve a malicious end.
If taken at face value, this represents a significant shift. It implies that off-the-shelf AI tools are no longer just force multipliers but are effectively becoming the pilots of complex intrusions.
The Orchestration Layer: Why Experts Are Skeptical
The cybersecurity community's reaction to the Anthropic AI cyberattack report has been less than credulous. The primary point of contention is the word "autonomous."
Jonathan Allon, vice president of research and development at Palo Alto Networks, interpreted the findings as a "bog standard attack" he and his colleagues "see every day".Jeremy Kirk, analyst at cyber threat intelligence company Intel 471, described Anthropic's report as "odd," noting that at just 13 pages, it lacked the traditional components of a usual threat intelligence report.
This distinction is vital. There is a massive difference between an AI independently discovering a zero-day vulnerability and an AI simply running a script that a human could have run ten years ago.
Many experts see this not as an AI genius at work, but as a "hybrid model." The AI acts as an orchestration layer—a tireless intern that stitches together reconnaissance tasks and drafts code. This is useful for attackers, certainly, but it doesn't necessarily support the "90% automation" figure in the way the public understands it.
If an AI writes a bash script that takes 10 hours to scan a network, attributing that entire 10-hour block to "AI automation" is statistically technically true but contextually misleading. The intelligence required to direct that action still resides with the human operator. Furthermore, Anthropic notes in its report that Claude produced errors and hallucinations, including valid-looking but ultimately useless credentials. This required human intervention to fix, suggesting the operation was far less "hands-off" than claimed.
Is It a Marketing Stunt? "Selling the Cure and the Disease"

A significant portion of the commentary surrounding this incident focuses on motive. Why release this report now, and why frame it this way?
Readers and industry observers have pointed out a perceived conflict of interest. Anthropic is positioning itself as a leader in AI safety. By highlighting a terrifying Anthropic AI cyberattack scenario, they simultaneously demonstrate the power of their model (it's smart enough to hack) and the necessity of their safety products (you need us to stop it).
One comment from the community summed up the sentiment: "Anthropic basically spent the whole piece highlighting how their AI can be leveraged for intrusion activity, but didn't give defenders a single IOC or attribution hint... 90% Flex 10% Value.
The skepticism is fueled by the timing. With global governments currently debating AI regulation and funding for AI safety institutes, a "state-backed" threat narrative serves as powerful leverage. It validates the need for strict oversight and potentially lucrative government contracts for defense-grade AI monitoring. When a company hypes up a threat that only their specific monitoring tools can detect and mitigate, observers are naturally inclined to label it a marketing stunt.
Operational Security and the "Chinese Cyber Espionage" Paradox

Perhaps the most damning critique of the Anthropic AI cyberattack narrative comes from the perspective of Operational Security (OpSec).
The idea that a sophisticated Chinese cyber espionage group—typically known as an Advanced Persistent Threat (APT)—would run a major campaign through a US-hosted, public-facing Large Language Model borders on the absurd for many professionals.
Using a cloud-based LLM like Claude for Command and Control (C&C) or sensitive exploit generation is the digital equivalent of mailing your attack plans directly to the FBI. These models log interactions. They are monitored for abuse.A state-level actor knows this.
Real nation-state threat actors value stealth above all else. If they want to use generative AI to write malware or analyze vulnerabilities, they are far more likely to use open-source models (like Llama) running on local, air-gapped infrastructure where no American tech giant can inspect their prompts. Using a commercial API that leaves a paper trail contradicts the modus operandi of high-level espionage. This contradiction leads many to believe the attackers were either low-level contractors, "script kiddies" experimenting with tools, or that the attribution to a state-backed group is flimsier than presented.
The Problem with Lack of Evidence
The strongest counter-argument to Anthropic's claims is the absence of hard data. The report provided narrative descriptions but lacked the technical artifacts the cybersecurity community relies on to verify threats.
There were no specific prompts released. No Indicators of Compromise (IoCs). No detailed logs showing the task decomposition in action.
"Prompt else it didn't happen," has become a rallying cry in the discussion. Without seeing the actual interactions, it is impossible to judge whether the AI was truly "planning" the attack or if it was merely responding to very specific, human-guided instructions.
Furthermore, user experiences with models like Claude often contradict the idea of a hyper-competent hacker. Users struggle to get these models to understand basic networking concepts without extensive hand-holding.The leap from "struggles with basic networking definitions" to "orchestrated a 30-target espionage campaign" is a chasm that the current evidence fails to bridge.
Task Decomposition and The Reality of AI Hacking
Despite the skepticism, the concept of task decomposition is a real and growing concern. It refers to the method of bypassing safety filters by atomizing a request.
If an attacker asks, "How do I steal data from this bank?", the model refuses. But if the attacker asks for a Python script to test SQL injection vulnerabilities for a "penetration test," then asks for a separate script to format database outputs, and a third to establish an encrypted tunnel, the model complies with each individual request.
Anthropic's engineers noted that the attackers framed their requests as benign penetration testing tasks. This confirms that the model wasn't "rogue"; it was tricked. The "intelligence" of the attack came from the human capable of breaking the heist into compliant chunks.
This doesn't make the Anthropic AI cyberattack story false, but it changes the genre. It's not a story about an autonomous AI agent; it's a story about a new interface for existing tools. The AI reduced the friction of writing boilerplate code, but it didn't invent the attack logic.
Future Implications: Lowering the Barrier to Entry
Whether the 90% automation figure is accurate or exaggerated, the incident highlights an undeniable trend: the barrier to entry for cybercrime is lowering.
You don't need to be a master coder to launch a reconnaissance campaign anymore; you just need to know how to prompt an engine to do it for you. Even if the AI hallucinated credentials or messed up the network mapping, the speed at which it can attempt these tasks poses a volume problem for defenders.
Experts agree that while we aren't at the stage of fully autonomous AI hackers, we are entering an era of "hybrid operations." Adversaries will treat LLMs as force multipliers.This allows smaller groups to punch above their weight and state actors to scale operations without recruiting more human personnel.
However, treating a thwarted attempt by a clumsy operator as a sophisticated, autonomous state-level threat does a disservice to defenders. It muddies the water between hypothetical risks and actual operational realities.
Conclusion
The Anthropic AI cyberattack report serves as a Rorschach test for the industry. For those selling security, it's a warning of a terrifying future that requires immediate investment. For those in the trenches of network defense, it looks like a sensationalized account of a standard intrusion, dressed up in the buzzwords of the moment.
Until companies like Anthropic release detailed logs, prompts, and technical evidence, the claim of a 90% autonomous attack remains unverified. The reality is likely more mundane: bad actors are using every tool available to them, including AI, but the human behind the keyboard is still pulling the strings. The danger isn't that the AI is thinking for itself; it's that it makes it easier for humans to act on their worst impulses.
FAQ

Q: Did a Chinese group really use Claude for a cyberattack?
Anthropic claims a Chinese state-sponsored group was behind the activity. However, independent security experts have questioned this attribution, noting that using a US-hosted public AI model is a massive operational security failure typical of amateurs, not state actors.
Q: What does the "80-90% automation" claim actually mean?
This figure likely refers to the volume of tasks executed by the AI, such as scanning ports or writing code snippets, rather than strategic decision-making. Humans still handled the critical "high-level" targeting and decisions, meaning the AI was a tool, not the commander.
Q: What is task decomposition in the context of AI attacks?
Task decomposition is a technique where attackers break a malicious goal into small, innocent-looking steps to bypass AI safety guardrails. For example, instead of asking for a hack, they ask for a "network connectivity test," which the AI views as a legitimate request.
Q: Why do people think the Anthropic AI cyberattack report is a marketing stunt?
Critics point to the lack of technical evidence and the timing of the report. By framing their own product as a dangerous weapon that only they can control, Anthropic positions itself to sell more security solutions and influence government regulation.
Q: Can AI models like Claude actually execute hacks independently?
Currently, no. While they can write code and scan for known vulnerabilities, they frequently hallucinate (invent false information) and make errors. They require constant human supervision to correct these mistakes, making them assistants rather than autonomous hackers.


