s3-8-3-3-20steps

s3 is a reinforcement-learning–trained search agent that learns to plan retrieval and answer questions efficiently. This release provides weights for research replication only. For usage, training, and evaluation follow our GitHub repo (we intentionally do not include inference snippets here).

📄 Reference: “s3: You Don’t Need That Much Data to Train a Search Agent via RL” (EMNLP 2025 Main).
🧑‍💻 GitHub: https://github.com/pat-jj/s3

What is in this repo?

A Hugging Face model folder with tokenizer files and sharded *.safetensors checkpoints exported from our VERL training runs (the “actor” policy). File layout mirrors the training outputs (e.g., config.json, tokenizer.json, and model-00001-of-00004.safetensors, etc.).

Important notes

We highly recommend training the model yourself via the GitHub repo. In our experience, testing/inference time can be much much heavier than training time.
Do not treat these weights as a drop-in general QA system; they are intended for the s3 pipelines described in the paper and codebase.
To run anything, please follow the GitHub instructions end-to-end (env setup, datasets, evaluation scripts, and RL configs).

Intended use & limitations

Research replication, ablations, and educational study of on-policy RL for retrieval-augmented search agents. Commercial or safety-critical use is not advised without extensive review and additional safeguards.

Citation

@inproceedings{jiang2025s3,
  title = {s3: You Don't Need That Much Data to Train a Search Agent via RL},
  author = {Jiang, Pengcheng and Xu, Xueqiang and Lin, Jiacheng and Xiao, Jinfeng and Wang, Zifeng and Sun, Jimeng and Han, Jiawei},
  year = {2025},
  booktitle = {Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing},
}

Last updated: 2025-09-29

Downloads last month: 9

Safetensors

Model size

8B params

Tensor type

BF16

Video Preview

Reinforcement Learning

Evaluation results

Metadata error: specify a dataset to view leaderboard