In subsection 2.3 of the paper is mentioned the step Distillation step of 671B into 7B
Is the released model the result of this distillation step?
Thanks :)
Curious, if you tried learning smaller models with CoT data - e.g. 7B/32B/235B?
· Sign up or log in to comment