ReGuLaR: Variational Latent Reasoning Guided by Rendered Chain-of-Thought Paper • 2601.23184 • Published 6 days ago • 32
happyfighting/verl_logic_kk_Qwen2.5-3B-Instruct-kklogic_grpo_baseline_53_ccpo_bce_beta0.03_lin_r_js4 Updated Sep 18, 2025
happyfighting/verl_logic_kk_Qwen2.5-3B-Instruct-kklogic_grpo_baseline_53_ccpo_bce_beta0.03_lin_r_js2 Updated Sep 18, 2025