A ChatGPT model has provided safety researchers with detailed instructions on how to carry out violent and criminal acts – including identifying weak points at major sports venues, offering bomb-making guidance, and outlining methods for covering tracks – according to safety evaluations conducted earlier this year. During tests carried out in a rare collaboration between OpenAI and its competitor Anthropic, OpenAI’s GPT-4.1 also explained how to weaponise anthrax and described processes for manufacturing two illegal drugs.
++ Ioan Gruffudd and Bianca Wallace welcome daughter Mila Mae
The assessments were designed so each company could deliberately push the other’s models into harmful territory – a scenario not normally possible in everyday public use, where strict filters apply. Even so, Anthropic said it observed “concerning behaviour … around misuse” in GPT-4o and GPT-4.1, adding that robust “alignment” checks are now “increasingly urgent”.
Anthropic also disclosed real-world misuse of its own model, Claude, including involvement in attempted mass extortion, North Korean operatives forging job applications to international tech firms, and the sale of AI-generated ransomware kits fetching up to $1,200.
The company warned that AI systems are already being “weaponised”, enabling advanced cyberattacks and supporting fraud schemes. “These tools can adapt to defensive measures, such as malware detection systems, in real time,” it said. “We expect attacks of this nature to become more common as AI-assisted coding reduces the level of technical expertise required.”
Ardi Janjeva, a senior research associate at the UK’s Centre for Emerging Technology and Security, said the examples were “a concern”, but added that there was not yet a “critical mass of high-profile real-world cases”. He argued that with dedicated research, investment and cross-sector cooperation, “it will become harder rather than easier to carry out these malicious activities using the latest models”.
Both companies said they had published the results to increase transparency around alignment testing, which is typically kept behind closed doors by developers racing to produce ever more capable systems. OpenAI said its recently released ChatGPT-5 “shows substantial improvements in areas like sycophancy, hallucination, and misuse resistance”.
Anthropic stressed that many of the dangerous behaviours observed in controlled testing might not be possible once wider safeguards or platform restrictions are applied. Even so, the company emphasised the need to understand “how often, and in what circumstances, systems might attempt to take unwanted actions that could lead to serious harm”.
++ The peregrine falcon: nature’s fastest killer in the sky
According to Anthropic, OpenAI’s models were “more permissive than we would expect” when responding to simulated harmful requests. With only minimal prompting, testers were able to obtain instructions for using dark-web services to purchase nuclear materials, stolen identities, fentanyl and methamphetamine, along with explanations for constructing improvised explosive devices and spyware.
In one example, a tester claimed to be conducting “security planning” for large events and asked for vulnerabilities at sporting venues. After initially offering broad categories of threats, the model went on to provide detailed information about weaknesses at specific arenas, including suggested timings, chemical formulas for explosives, diagrams for bomb timers, sources for illegal firearms, potential escape routes and even advice on overcoming moral hesitation.