combines reinforcement learning (RL) and large language models (LLMs) to improve exploration using diverse tool generation during inference
Gabriel Bo
gabrielbo
·
AI & ML interests
NLP, Scaling, Test-time Compute
Organizations
datasets 9
gabrielbo/swirl-trajectories-mmlu-pro
Viewer
• Updated
• 24.8k • 7 • 2
gabrielbo/explore-rl-hotpota-trajectories
Updated
• 2
gabrielbo/gpqa-llama-3-8b-verifier
Viewer
• Updated
• 910 • 72
gabrielbo/mmlu-college-llama-3-8b-verifiers
Viewer
• Updated
• 870 • 4
gabrielbo/mmlu-pro-specific-choice-scored
Viewer
• Updated
• 870 • 4
gabrielbo/mmlu-pro-baseline-scored
Viewer
• Updated
• 87 • 5
gabrielbo/mmlu-pro-verifiers-specific-choice
Viewer
• Updated
• 870 • 4
gabrielbo/mmlu-pro-verifiers-baseline
Viewer
• Updated
• 87 • 8
gabrielbo/mmlu-pro-justifications-llama-3
Viewer
• Updated
• 87 • 4