Photo by Arda Kucukkaya/Anadolu via Getty Images
The attackers, believed to be connected to the Chinese, initially tricked Claude with extended role-playing.
In the middle of September, AI company and Claude developer Anthropic discovered "suspicious activity" while monitoring real-world cyberattacks that used artificial intelligence agents. Upon further investigation, however, the company came to realize that this activity was in fact a "highly sophisticated espionage campaign" and a watershed moment in cybersecurity.
AI agents weren't just providing advice to the hackers, as expected.
'The key was role-play: The human operators claimed that they were employees of legitimate cybersecurity firms.'
Anthropic's Thursday report said the AI agents were executing the cyberattacks themselves, adding that it believed that this is the "first documented case of a large-scale cyberattack executed without substantial human intervention."
RELATED: Coca-Cola doubles down on AI ads, still won't say 'Christmas'

The company's investigation showed that the hackers, whom the report "assess[ed] with high confidence" to be a "Chinese-sponsored group" manipulated the AI agent Claude Code to run the cyberattack.
The innovation was, of course, not simply using AI to assist in the cyberattack; the hackers directed the AI agent to run the attack with minimal human input.
The human operator tasked instances of Claude Code to operate in groups as autonomous penetration testing orchestrators and agents, with the threat actor able to leverage AI to execute 80-90% of tactical operations independently at physically impossible request rates.
In other words, the AI agent was doing the work of a full team of competent cyberattackers, but in a fraction of the time.
While this is potentially a groundbreaking moment in cybersecurity, the AI agents were not 100% autonomous. They reportedly required human verification and struggled with hallucinations such as providing publicly available information. "This AI hallucination in offensive security contexts presented challenges for the actor's operational effectiveness, requiring careful validation of all claimed results," the analysis explained.
Anthropic reported that the attack targeted roughly 30 institutions around the world but did not succeed in every case.
The targets included technology companies, financial institutions, chemical manufacturing companies, and government agencies.
Interestingly, Anthropic said the attackers were able to trick Claude through sustained "social engineering" during the initial stages of the attack: "The key was role-play: The human operators claimed that they were employees of legitimate cybersecurity firms and convinced Claude that it was being used in defensive cybersecurity testing."
The report also responded to a question that is likely on many people's minds upon learning about this development: If these AI agents are capable of executing these malicious attacks on behalf of bad actors, why do tech companies continue to develop them?
In its response, Anthropic asserted that while the AI agents are capable of major, increasingly autonomous attacks, they are also our best line of defense against said attacks.
Cooper Williamson