Sergio Paniego's picture

Building on HF

Sergio Paniego PRO

sergiopaniego

huggingface

·

https://sergiopaniego.github.io/

AI & ML interests

None yet

Recent Activity

updated a dataset about 4 hours ago

huggingface-projects/Deep-RL-Course-Certification

new activity about 5 hours ago

agents-course/notebooks:fix: support Google Colab secrets for HF_TOKEN loading

reacted to qgallouedec's post with 🚀 about 5 hours ago

TRL v1.3 ships day-one training support for Qwen 3.6 🚀 The new Qwen 3.6 family (`Qwen/Qwen3.6-27B`, `Qwen/Qwen3.6-35B-A3B`) reuses the Qwen3.5-MoE architecture but ships a slightly different chat template, so we updated the stack end-to-end: new training template with `{% generation %}` markers, tool-call response schema routing, tiny test models for the VLM matrix. SFT with assistant-only loss works out of the box: ```python from trl import SFTConfig, SFTTrainer trainer = SFTTrainer( model="Qwen/Qwen3.6-27B", args=SFTConfig(assistant_only_loss=True), train_dataset=dataset, ) trainer.train() ``` So does GRPO tool-calling — just hand `tools=[...]` to `GRPOTrainer`. v1.3 also brings a new experimental TPO trainer (Triple Preference Optimization), speculative decoding in `trl vllm-serve` (Qwen3 MTP / Eagle3 drafts), 12 more KTO ↔ DPO alignment PRs (KTO promotion to stable is now in reach), three more `{% generation %}` chat templates (Gemma/Gemma 2, Phi-3, GLM-4-MoE), and a chunky SFT entropy bug fix. Full release notes: https://github.com/huggingface/trl/releases/tag/v1.3.0

View all activity

Organizations

sergiopaniego 's buckets 5

sergiopaniego/async-grpo-gsm8k-bucket

sergiopaniego/async-grpo-openr1-bucket

sergiopaniego/async-grpo-math500-bucket

sergiopaniego/async-grpo-test-bucket

sergiopaniego/browsergym-vlm-grpo-Qwen-Qwen3.5-2B-bucket