Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up

All HF Hub posts

SeaWolf-AIย 
posted an update about 23 hours ago
view post
Post
657
๐Ÿงฌ Darwin-35B-A3B-Opus โ€” The Child That Surpassed Both Parents

What if a merged model could beat both its parents? We proved it can.
Darwin-35B-A3B-Opus is a 35B MoE model (3B active) built with our Darwin V5 engine โ€” the first evolution system that CT-scans parent models before merging them.
๐Ÿค— Model: FINAL-Bench/Darwin-35B-A3B-Opus

The result speaks for itself: GPQA Diamond 90.0%, versus Father (Qwen3.5-35B-A3B) at 84.2% and Mother (Claude 4.6 Opus Distilled) at 85.0%. That's +6.9% over Father and +5.9% over Mother. Not a tradeoff โ€” a genuine leap. Meanwhile, MMMLU sits at 85.0% (Father: 85.2%), multimodal is fully intact, and all 201 languages are preserved.

How? Model MRI changed everything. Traditional merging is guesswork. Darwin V4 added evolution. Darwin V5 added X-ray vision. Model MRI scans each parent layer by layer and discovers: Mother's L34โ€“L38 is the reasoning engine (peak cosine distance), 50โ€“65% of Mother's experts are dead (killed by text-only distillation), and Father is a healthy generalist with every expert alive. The prescription: transplant Mother's reasoning brain at L38 (90% weight), replace her dead experts with Father's living ones, and let Father's router handle the output layer. Reasoning went up. Versatility stayed intact. No tradeoff โ€” just evolution.

35B total, 3B active (MoE) ยท GPQA Diamond 90.0% ยท MMMLU 85.0% (201 languages) ยท Multimodal Image & Video ยท 262K native context ยท 147.8 tok/s on H100 ยท Runs on a single RTX 4090 (Q4) ยท Apache 2.0
Darwin V5's full algorithm and technical details will be released alongside an upcoming paper.

๐Ÿš€ Live Demo: FINAL-Bench/Darwin-35B-A3B-Opus

๐Ÿ† FINAL Bench Leaderboard: FINAL-Bench/Leaderboard

๐Ÿ“Š ALL Bench Leaderboard: FINAL-Bench/all-bench-leaderboard

Built by VIDRAFT ยท Supported by the Korean Government GPU Support Program
danielhanchenย 
posted an update about 22 hours ago
view post
Post
794
A new way to use Unsloth.

Coming soon...
Shrijanagainย 
posted an update 2 days ago
view post
Post
2262
โ€‹๐Ÿš€ Bharat AI Revolution ka Hissa Banein! ๐Ÿ‡ฎ๐Ÿ‡ณ

โ€‹Kya aap Bharat ko AI ki duniya mein ek nayi pehchan dilana chahte hain ?

SKT AI Labs sirf ek naam nahi, ek mission haiโ€”desh ko digital shakti dene ka aur "Viksit Bharat" ke sapne ko sach karne ka.

โ€‹Humse Kyun Judein?

โ€‹1. Desh ka Apna AI: Hum aise models bana rahe hain jo khas taur par Bharat ki zarooraton aur bhashaon ke liye hain.

โ€‹2. Open Collaboration: Hamare Hugging Face repository par hamare kaam ko dekhein, test karein aur apna yogdan dein.

3. Technological Growth: Agar aap student hain, developer hain ya tech enthusiast hain, toh hamare saath naya seekhne aur grow karne ka yeh behtareen mauka hai.

โ€‹Join here

sKT-Ai-Labs

๐Ÿ”—
sKT-Ai-Labs


โ€‹Aaiye, saath milkar Bharat AI Revolution ko aage badhate hain! ๐Ÿ’ป๐Ÿ”ฅ

โ€‹#SKTAILabs #DigitalIndia #AIRevolution #ViksitBharat #TechInnovation #JoinTheMission
Ujjwal-Tyagiย 
posted an update 2 days ago
view post
Post
2698
I am sharing my study material for AI & ML, these books are really a "bible" and gives very strong foundation, I also have given guidance, introduction and my master notes in the dataset repo card! I hope you will find them helpful, if you have any queries, just start a discussion and I am always there to help you out!
Ujjwal-Tyagi/ai-ml-foundations-book-collection
  • 3 replies
ยท
reaperdoesntknowย 
posted an update about 20 hours ago
view post
Post
548
Your Loss Function Has Singularities. Classical Calculus Can't See Them.

Introducing Discrepancy Calculus (DISC) โ€” treating training singularities as structure, not noise.

Loss plateaus, mode collapse, catastrophic forgetting, distilled models that know things the teacher never taught โ€” we engineer around these. But what if those singularities are the actual structure of the learning problem?

The core insight: Every BV function decomposes into smooth (what classical calculus handles), jump (capability emergence, loss plateaus breaking), and Cantor (ghost imprinting โ€” knowledge transferring through weight-space topology, not gradient signal). Classical analysis sees only the first. DISC sees all three.

The paper proves this isn't alternative notation โ€” it's strictly larger. The Meta-Discrepancy Theorem: where singularities exist, the classical FTC/MVT/chain-rule package is provably impossible.

What it explains:

TopologicalQwen exhibited literary reasoning from physics-only data โ€” the Cantor part explains how. DualMind's Exploreโ†’Examineโ†’Response loop operationalizes DISC as inference dynamics. 50 models, 35K+ downloads, all built on this framework.

Paper: Discrepancy Calculus: Foundations and Core Theory (DOI: 10.57967/hf/8194) โ€” 8 axioms, proofs, computational recipes.

Series: Structure Over Scale (DOI: 10.57967/hf/8165) โ†’ Three Teachers to Dual Cognition (DOI: 10.57967/hf/8184) โ†’ DISC Foundations

โ€” Roy S. Colca Jr., Convergent Intelligence LLC: Research Division
fffiloniย 
posted an update about 19 hours ago
view post
Post
569
AniDoc is back ๐ŸŽ‰

Iโ€™ve fixed the Space and brought it back to life:
- โœ… Working again after being broken for a while
- โœ… Updated to Gradio 6
- โœ… Compatible with ZeroGPU
- โœ… Output videos now preserve original resolution and FPS

I also added advanced controls so you can experiment more (tracking, seed, motion, sketch).

Try it here: fffiloni/AniDoc
alibidaranย 
posted an update about 19 hours ago
view post
Post
515
๐Ÿง  Introducing Qwen3.5 โ€” Cognitive Reasoning Mode

I fine-tuned Qwen2.5 with GRPO to actually think before it answers โ€” not just pattern-match.

Most LLMs mimic reasoning. This one builds a real cognitive path:

๐Ÿ“Œ Plan โ†’ understand the task
๐Ÿ” Monitor โ†’ reason step by step
โœ… Evaluate โ†’ verify before answering

Every response follows a strict structured protocol:
<think> <planning> ... <monitoring> ... <evaluation> ... </think>
Then a clean, reasoning-free <output>.

The model self-checks its own structure. If a section is missing or malformed โ†’ the response is invalid.

This isn't chain-of-thought slapped on top. The reasoning protocol is baked in via RL.

๐Ÿ”— Full README + inference code below ๐Ÿ‘‡
alibidaran/Qwen_COG_Thinker_Merged

#AI #LLM #Qwen #ReasoningModels #GRPO #OpenSource
sergiopaniegoย 
posted an update about 20 hours ago
view post
Post
395
TRL is officially an adult ๐Ÿฅณ

excited to announce TRL v1.0โ—๏ธ

head to the blog to see how we got here and whatโ€™s next for this post-training library, designed to keep pace with the field

https://huggingface.co/blog/trl-v1
  • 1 reply
ยท
qgallouedecย 
posted an update about 21 hours ago
view post
Post
609
TRL v1.0 is out!

Hugging Face's TRL library is downloaded 3 million times a month. Over 130k models trained with it are public on the Hub, and major projects like @unsloth and @axolotl-ai-co build directly on top of it. v1.0 is the moment we acknowledged that responsibility explicitly, with a real stability contract.

The field hasn't settled. Building stable software in a domain that keeps invalidating its own assumptions is the actual problem we're solving. The answer is a design that can absorb the next shift without breaking what people rely on.

What's in v1.0:
Deep Hugging Face integration, low infrastructure burden
What's next: asynchronous GRPO, better scaling support, and making training legible enough that agents can inspect and steer it.

pip install --upgrade trl


Read more: hf.co/blog/trl-v1
OzTianluย 
posted an update 1 day ago
view post
Post
590
https://github.com/lizixi-0x2F/March
I just released March, an open-source high-performance KV cache sharing library for LLM inference that uses Trie-based prefix deduplication.
When you run LLM services, you often see thousands of requests sharing the same system prompt and conversation history. But traditional KV cache systems store each sequence separately โ€” duplicating the exact same data over and over again. Pure waste.
March uses a Trie structure to automatically detect and reuse identical token prefixes. Instead of storing [system_prompt + history] 1000 times, it's stored once. Everyone shares it.
- 80-97% memory reduction in prefix-heavy workloads (tested on SmolLM2-135M with 500 multi-turn conversations)
- Zero-copy queries โ€” returns direct pointers into the memory pool, no expensive memcpy on the hot path
- Predictable memory usage โ€” fixed-size page pool with O(L) complexity
- Trade-off: slightly slower than dict O(1) lookup, but the memory savings are worth it in production
  • 1 reply
ยท