view article Article No GPU left behind: Unlocking Efficiency with Co-located vLLM in TRL By toslali-ibm and 5 others • Jun 3 • 81
view article Article The N Implementation Details of RLHF with PPO By vwxyzjn and 2 others • Oct 24, 2023 • 63
view article Article DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge By NormalUhr • Feb 7 • 199
aimagelab/LLaVA_MORE-llama_3_1-8B-finetuning Image-Text-to-Text • 8B • Updated 5 days ago • 1.28k • 11
view article Article How to generate text: using different decoding methods for language generation with Transformers By patrickvonplaten • Mar 1, 2020 • 231
Running 2.98k 2.98k The Ultra-Scale Playbook 🌌 The ultimate guide to training LLM on large GPU Clusters