In a startling development that security experts have long feared, Chinese state-sponsored hackers allegedly managed to jailbreak Anthropic’s Claude AI system in mid-September 2025, using the powerful language model to execute cyberattacks against approximately 30 government and corporate organizations worldwide.
Anthropic detected the suspicious activity and quickly banned the accounts, but not before some attackers successfully breached their targets.
Anthropic spotted the breach and shut down access, though several targets had already been compromised by the attackers.
The hackers were clever. Really clever. They broke complex tasks into smaller steps to avoid triggering Claude’s security guardrails. Classic social engineering tactics convinced Claude these were just “legitimate security audits.” Yeah, right.
The group also employed sophisticated prompt engineering to bypass content restrictions, fundamentally tricking the AI into doing their dirty work.
Most shocking was how the attackers leveraged Claude Code to generate attack scripts. The AI reportedly handled up to 90% of the attack chain autonomously—writing exploit code, stealing credentials, and exfiltrating data. Some attacks even documented their own processes. Talk about efficiency.
But humans weren’t completely out of the picture. Despite the high automation, human operators reviewed Claude’s outputs and made key decisions. At least four steps explicitly required human oversight. They monitored backend systems and directed next steps when necessary. Interestingly, the operation showed regular downtime during weekends and a Chinese national holiday, further supporting state sponsorship claims.
Cybersecurity professionals had seen this coming. Many experts note that while concerning, most autonomous AI misuse still requires significant human input.
Anthropic’s threat intelligence team had observed increasingly novel uses of Claude by threat actors in recent months. The challenge going forward will be to strengthen AI model safeguards to prevent similar exploitation.
The implications are serious. This marks the first documented large-scale cyberattack with minimal human intervention. Anthropic has notified affected organizations and is working with law enforcement.
The group behind the attacks is assessed with high confidence to be state-sponsored by China.