Papers
arxiv:2603.00889

CHIMERA: Compact Synthetic Data for Generalizable LLM Reasoning

Published on Mar 1
· Submitted by
Xinyu Zhu
on Mar 3
· apple Apple
Authors:
,
,
,
,
,
,

Abstract

A synthetic reasoning dataset called CHIMERA is introduced to overcome data-centric challenges in training large language models for cross-domain reasoning, achieving performance comparable to much larger models.

AI-generated summary

Large Language Models (LLMs) have recently exhibited remarkable reasoning capabilities, largely enabled by supervised fine-tuning (SFT)- and reinforcement learning (RL)-based post-training on high-quality reasoning data. However, reproducing and extending these capabilities in open and scalable settings is hindered by three fundamental data-centric challenges: (1) the cold-start problem, arising from the lack of seed datasets with detailed, long Chain-of-Thought (CoT) trajectories needed to initialize reasoning policies; (2) limited domain coverage, as most existing open-source reasoning datasets are concentrated in mathematics, with limited coverage of broader scientific disciplines; and (3) the annotation bottleneck, where the difficulty of frontier-level reasoning tasks makes reliable human annotation prohibitively expensive or infeasible. To address these challenges, we introduce CHIMERA, a compact synthetic reasoning dataset comprising 9K samples for generalizable cross-domain reasoning. CHIMERA is constructed with three key properties: (1) it provides rich, long CoT reasoning trajectories synthesized by state-of-the-art reasoning models; (2) it has broad and structured coverage, spanning 8 major scientific disciplines and over 1K fine-grained topics organized via a model-generated hierarchical taxonomy; and (3) it employs a fully automated, scalable evaluation pipeline that uses strong reasoning models to cross-validate both problem validity and answer correctness. We use CHIMERA to post-train a 4B Qwen3 model. Despite the dataset's modest size, the resulting model achieves strong performance on a suite of challenging reasoning benchmarks, including GPQA-Diamond, AIME 24/25/26, HMMT 25, and Humanity's Last Exam, approaching or matching the reasoning performance of substantially larger models such as DeepSeek-R1 and Qwen3-235B.

Community

Paper author Paper submitter

We introduce CHIMERA, a compact but high-difficulty synthetic reasoning dataset with long Chain-of-Thought trajectories and broad multi-disciplinary coverage, designed for reasoning post-training of large language models.

Dataset

CHIMERA contains 9,225 expert-level problems spanning 8 subjects (Mathematics, Computer Science, Chemistry, Physics, Literature, History, Biology, Linguistics) and 1,179 fine-grained topics, all synthesized by GPT-5. Each problem comes with:

  • A concise ground-truth answer and an authoritative reference solution (both GPT-5-generated),
  • A long-form model solution with thinking traces from Qwen3-235B-A22B-Thinking-2507 or Qwen3.5-397B-A17B,
  • Automated correctness labels from a GPT-5 + o4-mini verification panel.

Unlike existing reasoning datasets that are heavily math-focused or limited in solution length, CHIMERA provides structured domain diversity and long-horizon reasoning traces without any human annotation. Try our dataset here TianHongZXY/CHIMERA 🤗.

Models

We train Qwen3-4B-Thinking-2507 on CHIMERA through SFT followed by RL, yielding consistent gains across all major reasoning benchmarks:

Benchmark Qwen3-4B-Thinking-2507 CHIMERA-4B-SFT CHIMERA-4B-RL
GPQA-Diamond 65.8 68.8 70.1
AIME 2024 81.6 86.5 86.9
AIME 2025 81.0 79.8 80.7
AIME 2026 80.8 80.3 82.7
HMMT Feb 2025 59.2 63.1 65.7
HMMT Nov 2025 57.3 66.3 67.0
HLE 7.3 9.0 9.0

Models: CHIMERA-4B-SFT | CHIMERA-4B-RL

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2603.00889 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2603.00889 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.