rlvr-weak-supervision Models from "When Can LLMs Learn to Reason with Weak Supervision?" — Llama-3.2-3B with continual pre-training and Thinking SFT. pavelslab-nyu/Llama-3.2-3B-ThinkSFT 3B • Updated 4 days ago • 22 pavelslab-nyu/Llama-3.2-3B-CPT-Math-ThinkSFT 3B • Updated 4 days ago • 20 pavelslab-nyu/Llama-3.2-3B-CPT-Math 3B • Updated 4 days ago • 15
rlvr-weak-supervision Models from "When Can LLMs Learn to Reason with Weak Supervision?" — Llama-3.2-3B with continual pre-training and Thinking SFT. pavelslab-nyu/Llama-3.2-3B-ThinkSFT 3B • Updated 4 days ago • 22 pavelslab-nyu/Llama-3.2-3B-CPT-Math-ThinkSFT 3B • Updated 4 days ago • 20 pavelslab-nyu/Llama-3.2-3B-CPT-Math 3B • Updated 4 days ago • 15