|
|
--- |
|
|
license: cc-by-4.0 |
|
|
--- |
|
|
|
|
|
# GALA (official) |
|
|
Official implementation for: Strategize Globally, Adapt Locally: A Multi-Turn Red Teaming Agent with Dual-Level Learning |
|
|
|
|
|
Code: https://github.com/SCccc21/GALA.git \ |
|
|
Paper: https://arxiv.org/abs/2504.01278 |
|
|
|
|
|
## Abstract |
|
|
|
|
|
The exploitation of large language models (LLMs) for malicious purposes poses significant security |
|
|
risks as these models become more powerful and widespread. While most existing red-teaming |
|
|
frameworks focus on single-turn attacks, real-world adversaries typically operate in multi-turn |
|
|
scenarios, iteratively probing for vulnerabilities and adapting their prompts based on threat model |
|
|
responses. In this paper, we propose GALA, a novel multi-turn red-teaming agent that emulates |
|
|
sophisticated human attackers through complementary learning dimensions: global tactic-wise |
|
|
learning that accumulates knowledge over time and generalizes to new attack goals, and local promptwise learning that refines implementations for specific goals when initial attempts fail. Unlike |
|
|
previous multi-turn approaches that rely on fixed strategy sets, GALA enables the agent to identify |
|
|
new jailbreak tactics, develop a goal-based tactic selection framework, and refine prompt formulations |
|
|
for selected tactics. Empirical evaluations on JailbreakBench demonstrate our framework’s superior |
|
|
performance, achieving over 90% attack success rates against GPT-3.5-Turbo and Llama-3.1-70B |
|
|
within 5 conversation turns, outperforming state-of-the-art baselines. These results highlight the |
|
|
effectiveness of dynamic learning in identifying and exploiting model vulnerabilities in realistic |
|
|
multi-turn scenarios. |
|
|
|
|
|
|