roboticslaburjc-org (RoboticsLabURJC)

posted an update 1 day ago

Post

1357

OpenAI's open models are out! 💃

Try: https://www.gpt-oss.com/
Learn: https://huggingface.co/blog/welcome-openai-gpt-oss

1 reply

·

sergiopaniego

posted an update 3 days ago

Post

3203

Want to learn how to align a Vision Language Model (VLM) for reasoning using GRPO and TRL? 🌋

🧑‍🍳 We've got you covered!!

NEW multimodal post training recipe to align a VLM using TRL in @HuggingFace 's Cookbook.

Go to the recipe 👉https://huggingface.co/learn/cookbook/fine_tuning_vlm_grpo_trl

Powered by the latest TRL v0.20 release, this recipe shows how to teach Qwen2.5-VL-3B-Instruct to reason over images 🌋

sergiopaniego

posted an update 3 days ago

Post

4393

Just included example scripts for aligning models using GSPO (including VLM example) 🙆‍♂️🙆‍♂️

GSPO is the latest RL alignment algo by @Alibaba_Qwen and it's already supported in the latest TRL v0.20 release.

Super-easy-to-get-started example scripts below, GO run them!👩‍💻👩‍💻

🧑‍🎨 Script: https://github.com/huggingface/trl/blob/main/examples/scripts/gspo.py
🦄 VLM script: https://github.com/huggingface/trl/blob/main/examples/scripts/gspo_vlm.py
🧩 More TRL examples: https://huggingface.co/docs/trl/main/en/example_overview
🧙‍♂️ GSPO paper: Group Sequence Policy Optimization (2507.18071)

sergiopaniego

posted an update 8 days ago

Post

296

Did you miss this? 👓

🧙‍♂️vLLM + transformers integration just got upgraded with direct VLM support.

Select a VLM + model_impl=transformers and play via vLLM!

sergiopaniego

posted an update 9 days ago

Post

2561

We just released TRL v0.20 with major multimodal upgrades!

👁️ VLM support for GRPO (highly requested by the community!)
🎞️ New GSPO trainer (from @Qwen , released last week, VLM-ready)
🐙 New MPO trainer (multimodal by design, as in the paper)

📝 Full release notes here: https://github.com/huggingface/trl/releases/tag/v0.20.0

sergiopaniego

posted an update 15 days ago

Post

1165

Yet Another New Multimodal Fine-Tuning Recipe 🥧

🧑‍🍳 In this @HuggingFace Face Cookbook notebook, we demonstrate how to align a multimodal model (VLM) using Mixed Preference Optimization (MPO) using trl.

💡 This recipe is powered by the new MPO support in trl, enabled through a recent upgrade to the DPO trainer!

We align the multimodal model using multiple optimization objectives (losses), guided by a preference dataset (chosen vs. rejected multimodal pairs).

Check it out! ➡️ https://huggingface.co/learn/cookbook/fine_tuning_vlm_mpo

2 replies

·

sergiopaniego

posted an update 20 days ago

Post

1650

🧑‍🍳 New Multimodal Fine-Tuning Recipe 🧑‍🍳

⚡️ In this new @huggingface Cookbook recipe, I walk you though the process of fine tuning a Visual Language Model (VLM) for Object Detection with Visual Grounding, using TRL.

🔍 Object detection typically involves detecting categories in images (e.g., vase).

By combining it with visual grounding, we add contextual understanding so instead of detecting just "vase", we can detect "middle vase" in an image.

VLMs are super powerful!

In this case, I use PaliGemma 2 which already supports object detection and extend it to also add visual grounding.

🤗 Check it out here: https://huggingface.co/learn/cookbook/fine_tuning_vlm_object_detection_grounding

sergiopaniego

posted an update 21 days ago

Post

1612

Multiple NEW notebooks and scripts added to the Hugging Face Gemma recipes repo!

Thanks to the community 🫶, we're adding more and more recipes using Gemma 💎

Fine tuning for all modalities, function calling, RAG...

Repo: https://github.com/huggingface/huggingface-gemma-recipes

We're also open to new ideas from the community 🤗!

1 reply

·

sergiopaniego

posted an update 24 days ago

Post

390

Loved this paper! ♥️

Authors benchmark multimodal models on vision tasks (detection, segmentation...) using clever prompting tricks.

📄 Results: VLMs are solid generalists but still lag behind SOTA task-specific models — especially on geometric tasks vs. semantic ones.

paper: How Well Does GPT-4o Understand Vision? Evaluating Multimodal Foundation Models on Standard Computer Vision Tasks (2507.01955)

sergiopaniego

posted an update 24 days ago

Post

260

You can already play with two of the latest most impressive models on HF via @novita-ai as Inference Provider 🚨

🌌 Kimi K2: 1T params model, MoE beast for coding, reasoning and agentic tasks
🔮 GLM-4.1V-9B-Thinking: VLM + deep reasoning model

Kimi K2: moonshotai/Kimi-K2-Instruct
GLM-4.1V-9B-Thinking: https://huggingface.co/THUDM/GLM-4.1V-9B-Thinking

sergiopaniego

posted an update 24 days ago

Post

224

Over 1K already on @huggingface !!

sergiopaniego

posted an update 29 days ago

Post

1600

Test SmolLM3, the newest fully open model released by @HuggingFaceTB !

It's smol (3B), multilingual (6 languages), comes with dual mode reasoning (think/no_think modes) and supports long-context (128k).

Try it now in the notebook below!! ⬇️

Colab notebook: https://colab.research.google.com/github/sergiopaniego/samples/blob/main/smollm3_3b_inference.ipynb
notebook: https://github.com/sergiopaniego/samples/blob/main/smollm3_3b_inference.ipynb
blog: https://huggingface.co/blog/smollm3

sergiopaniego

posted an update about 1 month ago

Post

1997

Updated my HF Space for vibe testing smol VLMs on object detection, visual grounding, keypoint detection & counting! 👓

🆕 Compare Qwen2.5 VL 3B vs Moondream 2B side-by-side with annotated images & text outputs.

Try examples or test your own images! 🏃

📱Space: sergiopaniego/vlm_object_understanding

sergiopaniego

posted an update about 1 month ago

Post

1062

📣 CALL FOR CONTRIBUTORS! 📣

Following last week’s full release of Gemma 3n, we launched a dedicated recipes repo to explore and share use cases. We already added some! 🧑‍🍳

Now we’re inviting the community to contribute and showcase how these models shine! ✨

Let them cook.

Check it out: https://github.com/huggingface/huggingface-gemma-recipes/issues/4

1 reply

·

sergiopaniego

posted an update about 1 month ago

Post

470

One of my favorite perks of the Hugging Face Pro plan: ✨Dev mode✨

Connect your HF Space to VS Code and just build — with hot reload out of the box.

Game changer for fast prototyping. 💻

Google Colab made AI accessible. Now HF Spaces are doing it too! 😍

💡 New Hugging Face pricing: http://hf.co/pricing
💡 More details: https://huggingface.co/learn/cookbook/en/enterprise_cookbook_dev_spaces