SCccc21
/

GALA

Model card Files Files and versions

GALA / README.md

SCccc21's picture

Update README.md

f301de9 verified 2 months ago

|

history blame contribute delete

1.65 kB

	---
	license: cc-by-4.0
	---

	# GALA (official)
	Official implementation for: Strategize Globally, Adapt Locally: A Multi-Turn Red Teaming Agent with Dual-Level Learning

	Code: https://github.com/SCccc21/GALA.git \
	Paper: https://arxiv.org/abs/2504.01278

	## Abstract

	The exploitation of large language models (LLMs) for malicious purposes poses significant security
	risks as these models become more powerful and widespread. While most existing red-teaming
	frameworks focus on single-turn attacks, real-world adversaries typically operate in multi-turn
	scenarios, iteratively probing for vulnerabilities and adapting their prompts based on threat model
	responses. In this paper, we propose GALA, a novel multi-turn red-teaming agent that emulates
	sophisticated human attackers through complementary learning dimensions: global tactic-wise
	learning that accumulates knowledge over time and generalizes to new attack goals, and local promptwise learning that refines implementations for specific goals when initial attempts fail. Unlike
	previous multi-turn approaches that rely on fixed strategy sets, GALA enables the agent to identify
	new jailbreak tactics, develop a goal-based tactic selection framework, and refine prompt formulations
	for selected tactics. Empirical evaluations on JailbreakBench demonstrate our framework’s superior
	performance, achieving over 90% attack success rates against GPT-3.5-Turbo and Llama-3.1-70B
	within 5 conversation turns, outperforming state-of-the-art baselines. These results highlight the
	effectiveness of dynamic learning in identifying and exploiting model vulnerabilities in realistic
	multi-turn scenarios.