ciudades turisticas
#15 opened 1 day ago
by
lolisponce

Model collapse after SFT
1
#14 opened 4 days ago
by
Banjiuyufen

Vocab missing tool-related strings in chat template, poor performance with tools
#13 opened 4 days ago
by
mattjcly
Can you please release how you post-train qwen3 on deepseek?
2
#12 opened 8 days ago
by
ZeroWw
Tried it, but not good as expected.
3
#11 opened 9 days ago
by
kk3dmax
/no_think 标签不能用了吗
4
#10 opened 9 days ago
by
loong
Any plans for a Qwen3-32B model?
👍
13
7
#9 opened 9 days ago
by
wanghf
BTW For programmer, `Gemma` series are best to help you write comments, docstrings, and documents.
🔥
1
1
#8 opened 9 days ago
by
DOFOFFICIAL

DeepSeek-R1-Lite
❤️
🚀
19
7
#6 opened 9 days ago
by
Dampfinchen
generation_config.json is missing
👀
👍
2
#5 opened 9 days ago
by
Doctor-Chad-PhD

Model broken
👍
3
8
#4 opened 9 days ago
by
sm54
Any plans on gemma series? ;-;
❤️
4
4
#2 opened 9 days ago
by
Nakdesu

Any plans on 30B-A3B model?
🔥
30
7
#1 opened 9 days ago
by
xxx777xxxASD
