arxiv:2510.00967

QUASAR: Quantum Assembly Code Generation Using Tool-Augmented LLMs via Agentic RL

Published on Oct 1

· Submitted by

Cong Yu on Oct 2

Upvote

Authors:

Cong Yu ,

Shilong Deng ,

Qingyuan Wu ,

Songlin Jiang ,

Abstract

QUASAR, an RL framework using tool-augmented LLMs, improves quantum circuit generation and optimization through verification and hierarchical rewards, achieving high validity compared to industrial LLMs.

AI-generated summary

Designing and optimizing task-specific quantum circuits are crucial to leverage the advantage of quantum computing. Recent large language model (LLM)-based quantum circuit generation has emerged as a promising automatic solution. However, the fundamental challenges remain unaddressed: (i) parameterized quantum gates require precise numerical values for optimal performance, which also depend on multiple aspects, including the number of quantum gates, their parameters, and the layout/depth of the circuits. (ii) LLMs often generate low-quality or incorrect quantum circuits due to the lack of quantum domain-specific knowledge. We propose QUASAR, an agentic reinforcement learning (RL) framework for quantum circuits generation and optimization based on tool-augmented LLMs. To align the LLM with quantum-specific knowledge and improve the generated quantum circuits, QUASAR designs (i) a quantum circuit verification approach with external quantum simulators and (ii) a sophisticated hierarchical reward mechanism in RL training. Extensive evaluation shows improvements in both syntax and semantic performance of the generated quantum circuits. When augmenting a 4B LLM, QUASAR has achieved the validity of 99.31% in Pass@1 and 100% in Pass@10, outperforming industrial LLMs of GPT-4o, GPT-5 and DeepSeek-V3 and several supervised-fine-tuning (SFT)-only and RL-only baselines.

View arXiv page View PDF GitHub 1 Add to collection

Community

Benyucong

Paper author Paper submitter 16 days ago

🔥 Concise & Promotional

🚀 QUASAR sets a new SOTA in quantum circuit generation with tool-augmented RL:
✅ 99.31% Pass@1 syntactic correctness (↑ over GPT-5, GPT-4o, DeepSeek-V3)
✅ 100% Pass@10 with stronger semantic alignment
✅ Hierarchical 4-level reward for syntax, distribution, expectation value & optimization

👉 Paper: arXiv:2510.00967
👉 Model: Benyucong/rl_quantum_4b
👉 Code: github.com/benyucong/QUASAR

🧠 Technical & Insightful

We introduce QUASAR, an agentic RL framework that equips LLMs with quantum-aware reasoning via external simulators and hierarchical rewards.
Unlike prior SFT-only or RL-only methods, QUASAR combines supervised fine-tuning with reinforcement learning guided by:

✅ Syntax reward (valid OpenQASM 3.0 circuits)
✅ Distributional alignment (Jensen–Shannon distance)
✅ Expectation-value reward (Hamiltonian alignment)
✅ Optimization-progress reward (fewer optimization steps to convergence)

📊 Results:

99.31% Pass@1 validity, 100% Pass@10
+12.95% higher Successful Rate of Expectation Value vs. RL-only GRPO
1.65× better High Quality Circuit Ratio than GPT-4o, 1.50× over GPT-5

QUASAR shows LLMs can internalize useful ansatz patterns & parameter initializations for QAOA/VQE—bridging the gap between general LLMs and domain-specific quantum code generation.