RLHFlow
/

Qwen3-4B-Instruct-2507-Reinforce-Ada-balance-hard

Model card Files Files and versions

Checkpoint from step=400 and trained on the hard prompt set.

Downloads last month: 14

Safetensors

Model size

4B params

Tensor type

BF16

·

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including RLHFlow/Qwen3-4B-Instruct-2507-Reinforce-Ada-balance-hard

Reinforce-Ada

Training & test sets and finetuned models • 19 items • Updated 4 days ago • 2