Hugging Face – Posts

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

All HF Hub posts

hesamation

posted an update about 18 hours ago

Post

984

longer context doesn't generate better responses. it can even hurt your llm/agent. 1M context window doesn't automatically make models smarter as it's not about the size; it's how you use it.

here are 4 types of context failure and why each one happens:

1. context poisoning: if hallucination finds its way into your context, the agent will rely on that false information to make its future moves. for example if the agent hallucinates about the "task description", all of its planning to solve the task would also be corrupt.

2. context distraction: when the context becomes too bloated, the model focuses too much on it rather than come up with novel ideas or to follow what it has learned during training. as Gemini 2.5 Pro technical report points out, as context grows significantly from 100K tokens, "the agent showed a tendency toward favoring repeating actions from its vast history rather than synthesizing novel plans".

3. context confusion: everyone lost it when MCPs became popular, it seemed like AGI was achieved. I suspected there is something wrong and there was: it's not just about providing tools, bloating the context with tool use derails the model from selecting the right one! even if you can fit all your tool metadata in the context, as their number grows, the model gets confused over which one to pick.

4. Context Clash: if you exchange conversation with a model step by step and provide information as you go along, chances are you get worse performance rather than providing all the useful information at once. one the model's context fills with wrong information, it's more difficult to guide it to embrace the right info. agents pull information from tools, documents, user queries, etc. and there is a chance that some of these information contradict each other, and it's not good new for agentic applications.

check this article by Drew Breunig for deeper read: https://www.dbreunig.com/2025/06/26/how-to-fix-your-context.html?ref=blog.langchain.com

2 replies

AdinaY

posted an update 2 days ago

Post

3181

Qwen3-Coder 💻 agentic code model by Alibaba Qwen team🚀

Qwen/Qwen3-Coder-480B-A35B-Instruct

✨ 480B total, 35B activated MoE
✨ Agentic Coding + Browser Use → Top code model performance
✨ 256K context (up to 1M via Yarn) for repo-scale understanding

2 replies

andito

posted an update 1 day ago

Post

1787

Many VLMs claim to process hours of video. But can they follow the story?🤔
Today, we introduce TimeScope: The benchmark that separates true temporal understanding from marketing hype. Let's see how much VLMs really understand!⏳

We test three skills that matter for real-world use:
🔎 Localized Retrieval: Find a specific action.
🧩 Information Synthesis: Piece together scattered clues.
🏃 Fine-Grained Perception: Analyze detailed motion (e.g., count how many times a person swings an axe).

The results are in, and they're revealing. Only Gemini 2.5 pro handles 1-hour-long videos.
Performance drops sharply with duration, proving that long video understanding is still challenging. We've found the breaking points—now the community can start fixing them.📈

Want to learn more? TimeScope is 100% open-source. Benchmark your model and help us build the next generation of video AI.

📖 Blog:
https://huggingface.co/blog/timescope-video-lmm-benchmark
👩‍💻 Leaderboard & Demo: Apollo-LMMs/TimeScope
📊 Dataset: Apollo-LMMs/TimeScope
⚙️ Eval Code: https://github.com/EvolvingLMMs-Lab/lmms-eval

AdinaY

posted an update 3 days ago

Post

2521

KAT-V1 🔥 a LLM that tackles overthinking by switching between reasoning and direct answers, by Kuaishou.

Kwaipilot/KAT-V1-40B

✨ 40B
✨ Step-SRPO: smarter reasoning control via RL
✨ MTP + Distillation: efficient training, lower cost

mitkox

posted an update about 19 hours ago

Post

534

I run Qwen3-Coder 480B locally on my Z8, with a 1-million token context window. It’s the equivalent of parallel-parking a Nimitz-class carrier in a kiddie pool. Thanks to whatever dark pact the llama.cpp, CUDA, and kernel folks signed, hybrid inferencing + VRAM↔RAM offload let me stream the model’s synapses across Xeon, RAM, and four lonely A6000s without summoning either the OOM killer or a small house fire.

prithivMLmods

posted an update about 23 hours ago

Post

1196

olmOCR [Allen AI] just got an upgrade! 📈🧑‍🍳

The allenai/olmOCR-7B-0725 — fine-tuned with allenai/olmOCR-mix-0225 on top of Qwen/Qwen2.5-VL-7B-Instruct, pushing the boundaries of OCR technology. It takes a single document image as input, with the longest side resized to 1288 pixels. High-quality, openly available approach to parsing pdfs and other complex documents optical character recognition.

Try the demo here: prithivMLmods/Multimodal-OCR

✨ Model: allenai/olmOCR-7B-0725
✨ Model [fp8]: allenai/olmOCR-7B-0725-FP8
✨ Multimodal Implementations Space Collection: prithivMLmods/multimodal-implementations-67c9982ea04b39f0608badb0

.
.
.
To know more about it, visit the model card of the respective model. !!

danielhanchen

posted an update 1 day ago

Post

1517

It's Qwen3 week! 💜 We uploaded Dynamic 2-bit GGUFs for:

Qwen3-Coder: unsloth/Qwen3-Coder-480B-A35B-Instruct-GGUF
Qwen3-2507: unsloth/Qwen3-235B-A22B-Instruct-2507-GGUF

So you can run them both locally!
Guides are in model cards.

wenhuach

posted an update about 24 hours ago

Post

904

🚀 AutoRound(https://github.com/intel/auto-round) Now Supports GGUF Export & Custom Bit Settings!

We're excited to announce that AutoRound now supports:
✅ GGUF format export – for seamless compatibility with popular inference engines.
✅ Custom bit settings – tailor quantization to your needs for optimal performance.

Check out these newly released models:
🔹Intel/Qwen3-235B-A22B-Instruct-2507-gguf-q4km-AutoRound
🔹Intel/Qwen3-235B-A22B-Instruct-2507-gguf-q2ks-mixed-AutoRound
🔹Intel/Kimi-K2-Instruct-gguf-q2ks-mixed-AutoRound

Stay tuned! An even more advanced algorithm for some configurations is coming soon.

YerbaPage

posted an update 2 days ago

Post

1362

How to achieve 100% Pass Rate on HumanEval ? 🔥

Meet MGDebugger if you are tired of LLMs failing on complex bugs 🤔 Our MGDebugger, just hit 100% accuracy on HumanEval using the DeepSeek-R1 model. 🚀

✨ Demo: learnmlf/MGDebugger
📝 Paper: From Code to Correctness: Closing the Last Mile of Code Generation with Hierarchical Debugging (2410.01215)
💻 Code: https://github.com/YerbaPage/MGDebugger

HumanEval may be retired, we're ready for the next challenge In more complex scenarios! You may also take look at this repo for a collection of awesome repo-level coding tasks!

🖥️ https://github.com/YerbaPage/Awesome-Repo-Level-Code-Generation

dhruv3006

posted an update 2 days ago

Post

2325

Ever wish you could have someone watching your Github repo 24/7?

We built an agent that monitors your repo, finds who most recently starred it, and autonomously reaches out via email!

Discord : https://discord.com/invite/ZYN7f7KPjS

Get your API Key here : https://tally.so/r/nrYr4X

Recently active users