how did you do it?
#4 opened 1 day ago
by
ehartford

compare to qwen3-8b and qwen3-14b
π
5
#3 opened 4 days ago
by
decem

Could the same distillation technology be used to create a draft model for DeepSeek R1 0528 ?
π
1
#2 opened 4 days ago
by
BernardH
Multilingual?
#1 opened 5 days ago
by
AaronFeng753