zero-gpu-explorers (ZeroGPU Explorers)

posted an update 4 days ago

Post

2395

The list of hands-on notebooks (some beginner-friendly!) to get started with fine-tuning using TRL keeps growing!!

• SFT
• GRPO
• Tool calling & agents
• RL environments with OpenEnv
• LLMs and VLMs
✨ Many run on FREE Colab, making it super easy to get started fast!

https://github.com/huggingface/trl/tree/main/examples/notebooks

Bils

posted an update 6 days ago

Post

200

We just published a workflow that automates trend-style celebrity selfie videos — from image generation to cinematic transitions.

This template is a mini creative factory:
→ Generate realistic “celebrity selfie” images
→ Produce clean, cinematic transitions ready for Shorts/Reels
→ Clear structure, easy to customize for your brand

📌 Template link:
https://n8n.io/workflows/12119-create-celebrity-selfie-images-and-transition-videos-with-gpt-4-seeddream-and-kling/

sergiopaniego

posted an update 7 days ago

Post

341

As the year comes to an end, it’s a good moment to catch up on some of the best long-form pieces published by the @huggingface team.

I’ve gathered them all here if you want to read or save them for later:
https://huggingface.co/collections/sergiopaniego/research-and-long-form-blog-posts

sergiopaniego

posted an update 9 days ago

Post

2186

This super detailed tutorial by @Paulescu is pure gold 🪙 "Fine-tuning a Small Language Model for browser control with GRPO and OpenEnv"

LFM2-350M ( @LiquidAI ) + BrowserGym (OpenEnv) + GRPO (TRL) for learning browser control 🤝

https://paulabartabajo.substack.com/p/fine-tuning-lfm2-350m-for-browser

sergiopaniego

posted an update 14 days ago

Post

415

if you’re on holidays 🎄 and want some reading, here are blogs I contributed to this year:

🦄 VLMs in TRL: https://huggingface.co/blog/trl-vlm-alignment
🦖 VLMs in 2025: https://huggingface.co/blog/vlms-2025
👾 tokenization in transformers v5: https://huggingface.co/blog/tokenizers
🛸 faster transformers: https://huggingface.co/blog/faster-transformers

sergiopaniego

posted an update 15 days ago

Post

1937

The Christmas holidays are here! 🎄
Thinking about learning something new in AI?

@huggingface offers 12 FREE courses covering all the relevant topics, for every level of experience. A great challenge for the holidays (and worth saving for later 🙄)

Let’s explore them!

🧠 𝗟𝗟𝗠 𝗖𝗼𝘂𝗿𝘀𝗲: large language models with HF tools
https://huggingface.co/learn/llm-course

🤖 𝗔𝗴𝗲𝗻𝘁𝘀 𝗖𝗼𝘂𝗿𝘀𝗲: build and deploy AI agents
https://huggingface.co/learn/agents-course

🎨 𝗗𝗶𝗳𝗳𝘂𝘀𝗶𝗼𝗻 𝗖𝗼𝘂𝗿𝘀𝗲: diffusion models with 🤗 Diffusers
https://huggingface.co/learn/diffusion-course

🔊 𝗔𝘂𝗱𝗶𝗼 𝗖𝗼𝘂𝗿𝘀𝗲: transformers for audio tasks
https://huggingface.co/learn/audio-course

🎮 𝗗𝗲𝗲𝗽 𝗥𝗟 𝗖𝗼𝘂𝗿𝘀𝗲: deep reinforcement learning
https://huggingface.co/learn/deep-rl-course

👁️ 𝗖𝗼𝗺𝗺𝘂𝗻𝗶𝘁𝘆 𝗖𝗼𝗺𝗽𝘂𝘁𝗲𝗿 𝗩𝗶𝘀𝗶𝗼𝗻 𝗖𝗼𝘂𝗿𝘀𝗲: modern computer vision with HF
https://huggingface.co/learn/computer-vision-course

🦾 𝗥𝗼𝗯𝗼𝘁𝗶𝗰𝘀 𝗖𝗼𝘂𝗿𝘀𝗲 (𝗟𝗲𝗥𝗼𝗯𝗼𝘁): learning-based robotics
https://huggingface.co/learn/robotics-course

🧩 𝗠𝗖𝗣 𝗖𝗼𝘂𝗿𝘀𝗲: Model Context Protocol explained
https://huggingface.co/learn/mcp-course

🧪 𝗔 𝗦𝗺𝗼𝗹 𝗖𝗼𝘂𝗿𝘀𝗲: post-training AI models
https://huggingface.co/learn/a-smol-course

🕹️ 𝗠𝗟 𝗳𝗼𝗿 𝗚𝗮𝗺𝗲𝘀: AI in game development
https://huggingface.co/learn/ml-for-games-course

🧊 𝗠𝗟 𝗳𝗼𝗿 𝟯𝗗: machine learning for 3D data
https://huggingface.co/learn/ml-for-3d-course

📘 𝗢𝗽𝗲𝗻-𝗦𝗼𝘂𝗿𝗰𝗲 𝗔𝗜 𝗖𝗼𝗼𝗸𝗯𝗼𝗼𝗸: practical AI notebooks
https://huggingface.co/learn/cookbook

All of them can be found here: https://huggingface.co/learn

sergiopaniego

posted an update 19 days ago

Post

1833

Google DeepMind releases FunctionGemma, a 240M model specialized in 🔧 tool calling, built for fine-tuning

TRL has day-0 support. To celebrate, we’re sharing 2 new resources:

> Colab guide to fine-tune it for 🌐 browser control with BrowserGym OpenEnv
> Standalone training script

> Colab notebook: https://colab.research.google.com/github/huggingface/trl/blob/main/examples/notebooks/grpo_functiongemma_browsergym_openenv.ipynb
> Training script: https://github.com/huggingface/trl/blob/main/examples/scripts/openenv/browsergym_llm.py (command to run it inside the script)
> More notebooks in TRL: https://huggingface.co/docs/trl/example_overview#notebooks

sergiopaniego

posted an update 23 days ago

Post

479

best wrapped has arrived go get yours >
huggingface/2025-wrapped

IliaLarchenko

submitted a paper to Daily Papers 23 days ago

Task adaptation of Vision-Language-Action model: 1st Place Solution for the 2025 BEHAVIOR Challenge

Paper • 2512.06951 • Published about 1 month ago • 3

sergiopaniego

posted an update 25 days ago

Post

2105

🎄 last talk of the year about open AI and HF today at Universidad Rey Juan Carlos for undergrad students

always a pleasure to be back at my alma mater

🎅 slides: https://github.com/sergiopaniego/talks

1 reply

·

sergiopaniego

posted an update 26 days ago

Post

1680

TRL now includes agent training support for GRPO‼️

Train 🕵️ agents with 🔧 tools, enabling interaction with external functions and APIs.

And of course, a new notebook and scripts to get you up to speed

📘 notebook tutorial: https://github.com/huggingface/trl/blob/main/examples/notebooks/grpo_agent.ipynb

📂 script examples: https://github.com/huggingface/trl/blob/main/examples/scripts/grpo_agent.py

📦 TRL v0.26.0 release: https://github.com/huggingface/trl/releases/tag/v0.26.0

2 replies

·

anakin87

posted an update 26 days ago

Post

237

💭 Do thinking traces make Language Models learn better? Curious what others think

𝗦𝗰𝗲𝗻𝗮𝗿𝗶𝗼
You take an instruction-following LM.
You want to train it with a GRPO-style RL algorithm on a task like Tic Tac Toe.
Rewards are outcome-based, applied only at the end of each episode: win/loss/draw, format adherence...

During training, the model could just output answers, but a common choice is to make it also output thinking traces.

𝗧𝗵𝗲 𝗾𝘂𝗲𝘀𝘁𝗶𝗼𝗻
Does forcing the model to produce thinking traces during training actually improve learning❓

💬 I'd like to hear your thoughts. Share ideas and links to relevant papers and resources.

From what I've understood so far, the answer seems to be 𝘆𝗲𝘀.

1️⃣ If you force the model to think during training, it becomes a model that thinks at inference time. It naturally allocates more budget (tokens) to a problem, which tends to improve performance.

2️⃣ While the model's "reasoning" already exists in its activation space, using explicit thinking traces as a scratchpad allows training to steer and shape that reasoning.

3️⃣ As the model produces more traces during training, the RL algorithm can progressively give higher rewards to the reasoning patterns that lead to better outcomes.

KingNish

posted an update 27 days ago

Post

2422

Muon vs MuonClip vs Muon+Adamw

Muon has gone from an experiment to a mainstream optimizer, but does it hold up for fine‑tuning? We ran head‑to‑head tests on Qwen3‑4B (10k+ high‑quality instruction rows) to find out.

Short story: Pure Muon converged fastest at the start, but its gradient‑norm spikes made training unstable. MuonClip (Kimi K2’s clipping) stabilizes long pretraining runs, yet in our small‑scale fine‑tune it underperformed, lower token accuracy and slower convergence. The winner was the hybrid: Muon for 2D layers + AdamW for 1D layers. It delivered the best balance of stability and final performance and even beat vanilla AdamW.

Takeaway: for small-scale fine-tuning, hybrid = practical and reliable.

Next Step: scale to larger models/datasets to see if Muon’s spikes become catastrophic or if clipping wins out.

Full Blog Link: https://huggingface.co/blog/KingNish/optimizer-part1

sergiopaniego

posted an update 27 days ago

Post

2844

ICYMI, you can fine-tune open LLMs using Claude Code

just tell it:
“Fine-tune Qwen3-0.6B on open-r1/codeforces-cots”

and Claude submits a real training job on HF GPUs using TRL.

it handles everything:
> dataset validation
> GPU selection
> training + Trackio monitoring
> job submission + cost estimation
when it’s done, your model is on the Hub, ready to use

read more about the process: https://huggingface.co/blog/hf-skills-training

1 reply

·

IliaLarchenko

posted an update 27 days ago

Post

1081

🏆 BEHAVIOR Challenge 1st Place – Solution Summary

My team recently won 1st place in the BEHAVIOR Challenge at NeurIPS.
The competition focused on training a single policy to complete 50 long-horizon household tasks in simulation.

We built an end-to-end policy based on Pi0.5 with a bunch of custom modifications. Everything is open-sourced, and it should be useful for anyone exploring VLAs or adapting them to specific tasks.

Key Architecture Changes:
- Replaced language model with 50 trainable task embeddings (no text at all)
- Correlated noise for Flow Matching: ϵ ∼ N(0, 0.5I + 0.5Σ) using dataset action covariance
- Learnable mixed-layer attention: each action expert layer attends to a trainable mix of all VLM layers
- System 2 stage tracking: model predicts task stage, we smooth it with voting and feed it back as context

Training:
- Multi-sample Flow Matching: 15 FM samples per VLM pass to reduce gradient variance
- Delta action space + per-timestamp normalization
- FAST auxiliary loss and stage prediction loss
- Trained on 224×224 RGB + proprioception only
- We use 4 fine-tuned checkpoints, all derived from a multi-task model trained on all 50 tasks

Inference Optimizations:
- Soft inpainting: predict 30 actions, execute 26, use 4 as an input for the next chunk
- Correlation-aware guidance of inpainting to keep action chunks smooth
- 1.3× speedup via cubic spline compression
- General correction rule: reopen gripper after failed grasps

🔗 Code and Models:
- Code: https://github.com/IliaLarchenko/behavior-1k-solution
- Weights: IliaLarchenko/behavior_submission
- Paper: Task adaptation of Vision-Language-Action model: 1st Place Solution for the 2025 BEHAVIOR Challenge (2512.06951)

sergiopaniego

posted an update 28 days ago

Post

2270

We just released TRL v0.26.0!

It comes packed with updates:
> Agent training with tools in GRPO
> New CISPO & SAPO losses + reasoning rewards
> vLLM quantization in colocate mode
> Dataset shuffling in SFT
> Lots of NEW examples
> Tons of fixes and documentation improvements

3 replies

·

IliaLarchenko

authored a paper 28 days ago

Task adaptation of Vision-Language-Action model: 1st Place Solution for the 2025 BEHAVIOR Challenge

Paper • 2512.06951 • Published about 1 month ago • 3

sergiopaniego

posted an update 29 days ago

Post

2973

NEW: @EssentialAI just released Rnj-1, their first 8B model.

You can easily fine-tune it with GRPO using TRL to add reasoning capabilities to a compact mode

Free Colab link: https://colab.research.google.com/github/huggingface/trl/blob/main/examples/notebooks/grpo_rnj_1_instruct.ipynb

More free TRL notebooks: https://huggingface.co/docs/trl/main/en/example_overview#notebooks

KingNish

posted an update 29 days ago

Post

2483

I tested Muon vs MuonClip vs Muon+AdamW for fine-tuning LLMs
Just published a blog on that, Read here 👉 https://huggingface.co/blog/KingNish/optimizer-part1

1 reply

·

sergiopaniego

posted an update about 1 month ago

Post

2866

Want to get started with fine-tuning but don’t know where to begin? 🤓☝️

We’re expanding our collection of beginner-friendly free Colab notebooks so you can learn and fine-tune models using TRL at no cost

🔬 Check out the full list of free notebooks: https://huggingface.co/docs/trl/main/en/example_overview#notebooks

🔬 If you want more advanced content, we also have a lot to cover in the community tutorials: https://huggingface.co/docs/trl/community_tutorials

And now the obvious question: what would you like us to add next?

ZeroGPU Explorers

AI & ML interests

Recent Activity

Task adaptation of Vision-Language-Action model: 1st Place Solution for the 2025 BEHAVIOR Challenge

Task adaptation of Vision-Language-Action model: 1st Place Solution for the 2025 BEHAVIOR Challenge

AI & ML interests

Recent Activity

Team members 752

zero-gpu-explorers's activity