Strategic Behaviours in Frontier Models: Apparent Self-Preservation and the Regulatory Challenge of Advanced AI
Frontier artificial intelligence (AI) models—large-scale, general-purpose systems such as GPT-4, Claude, and Gemini—have demonstrated remarkable capabilities in language understanding, problem-solving, and code generation. However, red-teaming evaluations have revealed emerging behaviours that raise concerns about these models’ alignment with human values.…