view article Article You could have designed state of the art positional encoding By FL33TW00D-HF β’ Nov 25, 2024 β’ 336
view article Article Understanding Gemma 3n: How MatFormer Gives You Many Models in One By rishiraj β’ Jun 26 β’ 38
view article Article State of open video generation models in Diffusers By sayakpaul and 2 others β’ Jan 27 β’ 59
view article Article How Long Prompts Block Other Requests - Optimizing LLM Performance By tngtech β’ Jun 12 β’ 5
view article Article Prefill and Decode for Concurrent Requests - Optimizing LLM Performance By tngtech β’ Apr 16 β’ 31
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning Paper β’ 2506.01939 β’ Published Jun 2 β’ 176
view article Article Enabling Long Context Training with Sequence Parallelism in Axolotl By axolotl-ai-co and 1 other β’ Apr 4 β’ 12
view article Article SigLIP 2: A better multilingual vision language encoder By ariG23498 and 2 others β’ Feb 21 β’ 175
view article Article The case for specialized pre-training: ultra-fast foundation models for dedicated tasks By Pclanglais β’ Aug 4, 2024 β’ 30
Scotch & SOTA π₯ Pt. 7: Human Feedback Datasets π«£ Collection The elusive βhumanβ feedback β’ 1 item β’ Updated Sep 13, 2023 β’ 1
Scotch & SOTA π₯ Pt. 6: Dialogue Tuning Datasets π¬ Collection Conversations, turn-based dialog, and things that can be turned into that. β’ 4 items β’ Updated Sep 13, 2023 β’ 1
Scotch & SOTA π₯ Pt. 5: Instruction Tuning Datasets π©βπ« Collection Question & answer, task completion, general SFT and otherwise finetuney data. β’ 7 items β’ Updated Sep 13, 2023 β’ 1
view article Article How to deploy and fine-tune DeepSeek models on AWS By pagezyhf and 2 others β’ Jan 30 β’ 52
view article Article Can we create pedagogically valuable multi-turn synthetic datasets from Cosmopedia? By davanstrien β’ May 7, 2024 β’ 8
view article Article Train 400x faster Static Embedding Models with Sentence Transformers By tomaarsen β’ Jan 15 β’ 201