Chinchilla paper actually shows that for a fixed compute budget, it is better to train a smaller model on more data rather than training a larger model for fewer steps.
tm23
tm23hgf
AI & ML interests
None yet
Recent Activity
new activity
16 days ago
BibbyResearch/3blue1brown-manim:Not a good dataset
commented on
an
article
28 days ago
Mixture of Experts Explained
upvoted
an
article
about 1 month ago
Mixture of Experts Explained
Organizations
None yet