Youssef Boulaouane
joseph-bou
AI & ML interests
None yet
Recent Activity
reacted to omarkamali's post with 🚀 6 days ago
I just might have cracked tokenizer-free LLMs. No vocab, no softmax.
I'm training a 22M params LLM rn to test this "thing" and it's able to formulate coherent sentences 🤯
Bear in mind, this is a completely new, tokenizer-free LLM architecture with built-in language universality.
Check the explainer video to understand what's happening. Feedback welcome on this approach!
reacted to Shrijanagain's post with 🚀 8 days ago
We are thrilled to announce the launch of SKT-OMNI-CORPUS-146T-V1, a massive-scale, high-quality dataset designed to power the next generation of Foundation Models (LLMs) from scratch.
Developed at SKT AI LABS, this corpus is not just a collection of data; it’s a mission to decentralize high-grade AI training for regional languages and global knowledge.
💎 Key Highlights:
•• Massive Scale: Targeting a multi-terabyte architecture for 146T-level tokenization.
•• Pure Quality: Curated from 500+ Elite Sources
•• Structured for MoE: Perfectly sharded into 3.5GB standardized units (SKT-𝕻 series) for seamless distributed training.
🤝 Open for Collaboration!
We are looking for AI researchers, CUDA engineers, and data scientists to join us in this journey of building Project Surya and the ST-X Series models. Whether it's optimization, custom tokenization, or architecture design—let’s build the future together.
Explore the Dataset on Hugging Face:
🔗 https://huggingface.co/datasets/Shrijanagain/SKT-OMNI-CORPUS-146T-V1
DSR -- 🔗 https://huggingface.co/datasets/Shrijanagain/SKT-DSRx10000
#AI #MachineLearning #OpenSource #IndicAI #SKTAILABS #LLM #BigData #HuggingFace #InnovationIndia liked a Space 9 days ago
victor/dlss-5-anything