Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
4
1
4
Ernest Perkowski
ernestp56
Follow
joshnguyen's profile picture
yehors-cv's profile picture
errai34's profile picture
6 followers
·
6 following
ernestp56
ernestp56
AI & ML interests
Large Language Models and Computer Vision.
Recent Activity
authored
a paper
5 days ago
AstroLLaMA-Chat: Scaling AstroLLaMA with Conversational and Diverse Datasets
authored
a paper
5 days ago
AstroLLaMA: Towards Specialized Foundation Models in Astronomy
reacted
to
osanseviero
's
post
with ❤️
over 1 year ago
I finished my model merging experiment day.🤗I would love your thoughts on this. What did I do? I merged Mistral Instruct 0.1 and 0.2 models using different merging techniques: - SLERP: linear interpolation (most popular method) - MoE: replace some forward layers with MoE layers; using a random gate for now - Frankenmerge: also known as passthrough, but that isn't very cool. It concatenates some specified layers ending in different numbers of params. In my case, I went from 7B to 9B. Note: merging is not building an ensemble of models. You can read more about merging techniques at https://huggingface.co/blog/mlabonne/merge-models Results I built the 3 models using mergekit (running in an HF Space) - took less than an hour to do the three) https://huggingface.co/collections/osanseviero/mistral-instruct-merges-659ebf35ca0781acdb86bb0a I'm doing a quick check with the OpenLLM Leaderboard. 🚨The OpenLLM Leaderboard is more suitable for pre-trained models than instruct models, but I still thought it would be interesting to look at the insights🚨 You can look at the attached image. Some interesting things - All three models performed somewhere between 0.1 and 0.2 - congrats to the 140 people who got it right in https://twitter.com/osanseviero/status/1745071548866736171 - Frankenmerge terribly sucked with GSM8K. It seems that adding some Mistral 0.1 layers actually degraded the performance a lot - this is worse than even 0.1! - Otherwise, frankenmerge was decent across HellaSwag, MMLU, and specially TruthfulQA - MoE is using random gating, so I expected something right in between 0.1 and 0.2, which was the case What do I do with this? Not sure tbh! I think doing proper MT bench evals would be nice. I also think all of us should give a nice GH star to mergekit because it's awesome. I would love to have the time to do end-to-end ablation studies, but cool new things are coming up. Let me know if you have any thoughts in the results
View all activity
Organizations
Papers
2
arxiv:
2401.01916
arxiv:
2309.06126
models
0
None public yet
datasets
0
None public yet