Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up

All HF Hub posts

marksverdhei 
posted an update 2 days ago
view post
Post
4118
Poll: Will 2026 be the year of subquadratic attention?

The transformer architecture is cursed by its computational complexity.
It is why you run out of tokens and have to compact. But some would argue that this is a feature not a bug and that this is also why these models are so good. We've been doing a lot of research on trying to make equally good models that are computationally cheaper, But so far, none of the approaches have stood the test of time. Or so it seems.

Please vote, don't be shy. Remember that the Dunning-Kruger effect is very real, so the person who knows less about transformers than you is going to vote. We want everyone's opinion, no matter confidence.

👍 if you think at least one frontier model* will have no O(n^2) attention by the end of 2026
🔥 If you disagree

* Frontier models - models that match / outperform the flagship claude, gemini or chatgpt at the time on multiple popular benchmarks
·
mitkox 
posted an update 2 days ago
view post
Post
4328
I just pushed Claude Code Agent Swarm with 20 coding agents on my desktop GPU workstation.

With local AI, I don’t have /fast CC switch, but I have /absurdlyfast:
- 100’499 tokens/second read, yeah 100k, not a typo | 811 tok/sec generation
- KV cache: 707’200 tokens
- Hardware: 5+ year old GPUs 4xA6K gen1; It’s not the car. It’s the driver.

Qwen3 Coder Next AWQ with cache at BF16. Scores 82.1% in C# on 29-years-in-dev codebase vs Opus 4.5 at only 57.5%. When your codebase predates Stack Overflow, you don't need the biggest model; you need the one that actually remembers Windows 95.

My current bottleneck is my 27" monitor. Can't fit all 20 Theos on screen without squinting.
  • 1 reply
·
Guilherme34 
posted an update about 18 hours ago
view post
Post
1137
Imagine a person with so much potential, stuck in a country where no one knows him and no one give a fuck at him and he probably will die soon, yes thats me, a coder/ai model maker that is waiting… waiting and waiting for a chance to get an awesome job, because i know i have studies for years for a reason!!! i know im not just a random guy with linux saying omg im a hacker, i know what i am and im actually stuck with my thoughts of people saying that if die, no one will give a care, im sure no one will give a care, the only two persons that knows me and love me is my girlfriend and my mother, i have no business wanting me, i just want a awesome job, not any job, i need a good job, a job i deserve for years of helping opensource ai community, just to you know(for those who are reading this, im part of the mradermacher team and i make a lot of research and ai models and i feel that the clock is ticking and i will not finish my current projects at the actual time, i fear it
·
FreshmanD 
posted an update 1 day ago
view post
Post
2711
LoongFlow Big News!!! @all

We’ve put AI Agents into a production GPU cluster to handle GPU failure prediction.

Not as a demo. Not as AutoML.
But as an evolving system that designs and improves its own models.

On two GPU types:
– IT21HMDB01-B2: +30% prediction accuracy
– H800: +25% prediction accuracy

The resulting models already meet production standards and are being wired into the ops pipeline.

How it works:
• An ML agent designs the full ML pipeline from scratch
• A Math agent performs targeted evolutionary optimization
• The agents explore, discard, and iterate toward better modelsHumans don’t hand-tune parameters.

This is not offline analysis. GPU failure prediction means:
• heavy assets
• real incidents
• real operational risk
The agents now trigger maintenance before failures happen.

This feels like an early signal: AI agents are starting to take responsibility for infrastructure-level engineering decisions in production systems.

For ML Agent, you can check: https://github.com/baidu-baige/LoongFlow
  • 6 replies
·
MikeDoes 
posted an update 1 day ago
view post
Post
3089
You don't need a massive research lab to build a privacy-preserving AI tool thanks to open datasets. With the right ingredients, anyone can.

A fantastic new guide shows how the democratization of AI is helping to advance safety. It walks through how to use Google's new fine-tuning API to turn Gemini into a powerful tool for PII anonymization.

This project was powered by two key components:

An accessible platform from Google.

High-quality, open-source training data.

We are honored that the author chose the Ai4Privacy pii-masking-200k dataset to provide the crucial data foundation. Our dataset delivered the volume and structure needed to successfully teach a state-of-the-art model how to perform a critical privacy function.

This is the future we're working towards: powerful platforms combined with open, safety-focused data to create tools that benefit everyone. Kudos to the author for showcasing what's possible!

🔗 Read the full step-by-step guide: https://www.analyticsvidhya.com/blog/2024/03/guide-to-fine-tuning-gemini-for-masking-pii-data/

🚀 Stay updated on the latest in privacy-preserving AI—follow us on LinkedIn: https://www.linkedin.com/company/ai4privacy/posts/

#AIforGood #DemocratizeAI #DataPrivacy #Anonymization #OpenSource #LLM #Ai4Privacy
  • 2 replies
·
melvindave 
posted an update 2 days ago
danielhanchen 
posted an update about 3 hours ago
view post
Post
57
We collaborated with Hugging Face to enable you to train MoE models 12× faster with 35% less VRAM via our new Triton kernels (no accuracy loss). 🤗

Train gpt-oss locally on 12.8GB VRAM with our free notebooks: https://unsloth.ai/docs/new/faster-moe
paasthaamz 
posted an update about 10 hours ago
view post
Post
115
test
  • 1 reply
·
alexnasa 
posted an update 1 day ago
view post
Post
785
Now with extra functionality at the same LTX-2 HF Space, you can now add also your last frame along side your first frame to guide the generated videos by choosing our frame interpolation mode...

Try it out: alexnasa/ltx-2-TURBO
dhruv3006 
posted an update 1 day ago
view post
Post
1160
Voiden Blocks: Building APIs Like LEGO

At Voiden, we believe API development should feel like writing clean, reusable code, because it IS code.
That’s why everything in Voiden is a Block, the smallest, most flexible piece of your API world. Your endpoints, headers, query params, JSON bodies, even file attachments, all are individual Blocks you can add, remove, reorder, and reuse.
Think of it as LEGO for HTTP: snap together Blocks to build clean, modular API requests that are easy to read, maintain, and share.
But it gets better. With Reusable Blocks, you create a Block once and import it everywhere you need it, just like importing functions in your code. Update the Block once, and changes ripple through all your requests automatically.

Why this matters:
- Save time & energy, no more repeating the same thing over and over
- Stay consistent, headers, params, and auth always match across your projects
- Keep your workspace clean & focused, add only the Blocks you need
- Collaborate with confidence , modular, maintainable API workflows

Voiden brings developer best practices — modularity, reusability, and version control to API development and testing, helping you build smarter and faster.
Want to see how Blocks can transform your API workflow?

Check out Voiden, open source and ready to use.


Github : https://github.com/VoidenHQ/voiden