Sergio Paniego PRO
AI & ML interests
None yet
Recent Activity
new activity 19 minutes ago
agents-course/notebooks:fix: support Google Colab secrets for HF_TOKEN loading reacted to qgallouedec's post with ๐ 38 minutes ago
TRL v1.3 ships day-one training support for Qwen 3.6 ๐
The new Qwen 3.6 family (`Qwen/Qwen3.6-27B`, `Qwen/Qwen3.6-35B-A3B`) reuses the Qwen3.5-MoE architecture but ships a slightly different chat template, so we updated the stack end-to-end: new training template with `{% generation %}` markers, tool-call response schema routing, tiny test models for the VLM matrix.
SFT with assistant-only loss works out of the box:
```python
from trl import SFTConfig, SFTTrainer
trainer = SFTTrainer(
model="Qwen/Qwen3.6-27B",
args=SFTConfig(assistant_only_loss=True),
train_dataset=dataset,
)
trainer.train()
```
So does GRPO tool-calling โ just hand `tools=[...]` to `GRPOTrainer`.
v1.3 also brings a new experimental TPO trainer (Triple Preference Optimization), speculative decoding in `trl vllm-serve` (Qwen3 MTP / Eagle3 drafts), 12 more KTO โ DPO alignment PRs (KTO promotion to stable is now in reach), three more `{% generation %}` chat templates (Gemma/Gemma 2, Phi-3, GLM-4-MoE), and a chunky SFT entropy bug fix.
Full release notes: https://github.com/huggingface/trl/releases/tag/v1.3.0
reacted to qgallouedec's post with ๐ฅ 38 minutes ago
TRL v1.3 ships day-one training support for Qwen 3.6 ๐
The new Qwen 3.6 family (`Qwen/Qwen3.6-27B`, `Qwen/Qwen3.6-35B-A3B`) reuses the Qwen3.5-MoE architecture but ships a slightly different chat template, so we updated the stack end-to-end: new training template with `{% generation %}` markers, tool-call response schema routing, tiny test models for the VLM matrix.
SFT with assistant-only loss works out of the box:
```python
from trl import SFTConfig, SFTTrainer
trainer = SFTTrainer(
model="Qwen/Qwen3.6-27B",
args=SFTConfig(assistant_only_loss=True),
train_dataset=dataset,
)
trainer.train()
```
So does GRPO tool-calling โ just hand `tools=[...]` to `GRPOTrainer`.
v1.3 also brings a new experimental TPO trainer (Triple Preference Optimization), speculative decoding in `trl vllm-serve` (Qwen3 MTP / Eagle3 drafts), 12 more KTO โ DPO alignment PRs (KTO promotion to stable is now in reach), three more `{% generation %}` chat templates (Gemma/Gemma 2, Phi-3, GLM-4-MoE), and a chunky SFT entropy bug fix.
Full release notes: https://github.com/huggingface/trl/releases/tag/v1.3.0