Gullal Singh Cheema
gullalc
AI & ML interests
Multimodality, Vision and Language, Cross-modal relations, Video Understanding
Recent Activity
liked
a dataset
42 minutes ago
HuggingFaceM4/FineVision
reacted
to
sergiopaniego's
post
with ๐ฅ
about 1 month ago
Want to learn how to align a Vision Language Model (VLM) for reasoning using GRPO and TRL? ๐
๐งโ๐ณ We've got you covered!!
NEW multimodal post training recipe to align a VLM using TRL in @HuggingFace's Cookbook.
Go to the recipe ๐https://huggingface.co/learn/cookbook/fine_tuning_vlm_grpo_trl
Powered by the latest TRL v0.20 release, this recipe shows how to teach Qwen2.5-VL-3B-Instruct to reason over images ๐
upvoted
a
collection
about 1 month ago
gpt-oss
Organizations
None yet