Is the 7B model - the distilled version of 671B?

#3
by vadimkantorov - opened

In subsection 2.3 of the paper is mentioned the step Distillation step of 671B into 7B

image.png

Is the released model the result of this distillation step?

Thanks :)


Curious, if you tried learning smaller models with CoT data - e.g. 7B/32B/235B?

Sign up or log in to comment