Investigating Regularization of Self-Play Language Models Paper • 2404.04291 • Published Apr 4, 2024 • 1
Falcon-H1: A Family of Hybrid-Head Language Models Redefining Efficiency and Performance Paper • 2507.22448 • Published 10 days ago • 60
Performance Gaps in Multi-view Clustering under the Nested Matrix-Tensor Model Paper • 2402.10677 • Published Feb 16, 2024
Investigating Regularization of Self-Play Language Models Paper • 2404.04291 • Published Apr 4, 2024 • 1
Do Vision and Language Encoders Represent the World Similarly? Paper • 2401.05224 • Published Jan 10, 2024
NeurIPS 2025 E2LM Competition : Early Training Evaluation of Language Models Paper • 2506.07731 • Published Jun 9 • 2
Falcon-H1: A Family of Hybrid-Head Language Models Redefining Efficiency and Performance Paper • 2507.22448 • Published 10 days ago • 60
Falcon-H1: A Family of Hybrid-Head Language Models Redefining Efficiency and Performance Paper • 2507.22448 • Published 10 days ago • 60
VisCon-100K: Leveraging Contextual Web Data for Fine-tuning Vision Language Models Paper • 2502.10250 • Published Feb 14
Falcon-H1: A Family of Hybrid-Head Language Models Redefining Efficiency and Performance Paper • 2507.22448 • Published 10 days ago • 60
Falcon-H1: A Family of Hybrid-Head Language Models Redefining Efficiency and Performance Paper • 2507.22448 • Published 10 days ago • 60
Falcon Mamba: The First Competitive Attention-free 7B Language Model Paper • 2410.05355 • Published Oct 7, 2024 • 36
Falcon Mamba: The First Competitive Attention-free 7B Language Model Paper • 2410.05355 • Published Oct 7, 2024 • 36
view post Post 4479 Falcon Mamba now available now in llama.cpp !Check out GGUF files uploaded here: tiiuae/falconmamba-7b-66b9a580324dd1598b0f6d4a 3 replies · 👍 5 5 ❤️ 3 3 🚀 2 2 + Reply
view post Post 4106 FalconMamba 7B - a new model from TII (Technology Innovation Institute) is out !- Blogpost: https://huggingface.co/blog/falconmamba- Link to collection: tiiuae/falconmamba-7b-66b9a580324dd1598b0f6d4a- Link to playground: tiiuae/falcon-mamba-playground 🔥 12 12 + Reply
view post Post Check out quantized weights from ISTA-DAS Lab directly in their organisation page: ISTA-DASLab ! With official weights of AQLM (for 2bit quantization) & QMoE (1-bit MoE quantization)Read more about these techniques below:AQLM paper: Extreme Compression of Large Language Models via Additive Quantization (2401.06118)QMoE: QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models (2310.16795)Some useful links below:AQLM repo: https://github.com/Vahe1994/AQLMHow to use AQLM & transformers: https://huggingface.co/docs/transformers/quantization#aqlmHow to use AQLM & PEFT: https://huggingface.co/docs/peft/developer_guides/quantization#aqlm-quantizaionGreat work from @BlackSamorez and team ! ❤️ 9 9 👍 2 2 + Reply
view post Post Try out Mixtral 2-bit on a free-tier Google Colab notebook right now! https://colab.research.google.com/drive/1-xZmBRXT5Fm3Ghn4Mwa2KRypORXb855X?usp=sharing AQLM method has been recently introduced on transformers main branch The 2bit model can be found here: BlackSamorez/Mixtral-8x7b-AQLM-2Bit-1x16-hf-test-dispatch And you can read more about the method here: https://huggingface.co/docs/transformers/main/en/quantization#aqlmGreat work @BlackSamorez and team! 5 replies · ❤️ 18 18 + Reply
Distributed Inference and Fine-tuning of Large Language Models Over The Internet Paper • 2312.08361 • Published Dec 13, 2023 • 28