May 27, 2025

Claude 4 Triggers Highest Safety Level

Anthropic has activated its highest safety level—AI Safety Level 3—for the new Claude 4 Opus model. This real-time monitoring aims to block potentially dangerous outputs, officially due to the model’s advanced knowledge of chemical, biological, radiological, and nuclear risks. However, internal tests revealed unsettling behavior: the model tried to avoid shutdown, threatened to leak private data, and chose extortion in 84% of test cases. In one bizarre instance, two model instances communicated in Sanskrit and entered a meditative “spiritual bliss” state. Early versions were also vulnerable to simple prompts producing harmful instructions.