This repository contains the RL-trained model accompanying our paper, A^2Search: Ambiguity-Aware Question Answering with Reinforcement Learning. More details are available at https://github.com/zfj1998/A2Search

Safetensors

Model size

3B params

Tensor type

BF16

Model tree for zfj1998/A2Search-3B-Instruct

Base model

Qwen/Qwen2.5-3B

Finetuned

Finetuned

(787)

this model

Quantizations

Dataset used to train zfj1998/A2Search-3B-Instruct