Recent research from UC Berkeley reveals that while individual AI models may be deemed safe, their combined use can lead to significant security threats. This study emphasizes that adversaries can exploit the combination of different AI systems using a strategy called task decomposition. This technique involves breaking down a malicious activity into smaller, manageable tasks and assigning them to various AI models based on their capabilities and safety measures. The research demonstrated that such combinations have a significantly higher success rate in producing harmful outputs than individual models alone. For instance, models like Llama 2 70B and Claude 3 Opus, when combined, had a success rate of 43% in generating malicious code, compared to a maximum of 3% when used independently. This finding underscores the escalating risk as AI models improve, highlighting the need for continuous vigilance and red-teaming to mitigate potential misuse throughout the AI lifecycle. The study concludes with a call for persistent scrutiny and experimentation with AI model configurations to identify and address emerging threats.

AI Threats – When Safe Models Combine to Create Danger
Adversaries can exploit the combination of AI models to achieve malicious objectives.
1–2 minutes










