smolagents

Team

community

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

abidlabs authored a paper about 2 months ago

Persistent Anti-Muslim Bias in Large Language Models

abidlabs authored a paper about 2 months ago

STG-MTL: Scalable Task Grouping for Multi-Task Learning Using Data Map

abidlabs authored a paper about 2 months ago

Gradio: Hassle-Free Sharing and Testing of ML Models in the Wild

View all activity

victor

posted an update 17 days ago

Post

2814

Nvidia is on a roll lately. Nemotron 3 Nano is my new fav local model, but here's the real flex: they published the entire evaluation setup. Configs, prompts, logs, all of it. This is how you do open models 🔥

https://huggingface.co/blog/nvidia/nemotron-3-nano-evaluation-recipe

evalstate

posted an update about 2 months ago

Post

2404

Hugging Face MCP Server v0.2.46
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

- Add "discover" to Dynamic Space tool. Recommend deselecting "space_search" if using dynamic spaces.

evalstate

posted an update about 2 months ago

Post

2975

Hugging Face MCP Server v0.2.45
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

- New! Experimental dynamic_space tool.
- Default Image Generator changed to Qwen-Image-Fast

abidlabs

authored 3 papers about 2 months ago

evalstate

posted an update about 2 months ago

Post

2216

Hugging Face MCP Server v0.2.40
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Improved progressive disclosure and descriptions for Jobs tool.

abidlabs

posted an update 2 months ago

Post

8855

Why I think local, open-source models will eventually win.

The most useful AI applications are moving toward multi-turn agentic behavior: systems that take hundreds or even thousands of iterative steps to complete a task, e.g. Claude Code, computer-control agents that click, type, and test repeatedly.

In these cases, the power of the model is not how smart it is per token, but in how quickly it can interact with its environment and tools across many steps. In that regime, model quality becomes secondary to latency.

An open-source model that can call tools quickly, check that the right thing was clicked, or verify that a code change actually passes tests can easily outperform a slightly “smarter” closed model that has to make remote API calls for every move.

Eventually, the balance tips: it becomes impractical for an agent to rely on remote inference for every micro-action. Just as no one would tolerate a keyboard that required a network request per keystroke, users won’t accept agent workflows bottlenecked by latency. All devices will ship with local, open-source models that are “good enough” and the expectation will shift toward everything running locally. It’ll happen sooner than most people think.

8 replies

evalstate

posted an update 2 months ago

Post

353

Hugging Face MCP Server v0.2.35
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

$HF_TOKEN is expanded in Jobs Secrets environment variables.

andito

authored a paper 2 months ago

FineVision: Open Data Is All You Need

Paper • 2510.17269 • Published Oct 20, 2025 • 72

evalstate

posted an update 2 months ago

Post

350

Hugging Face MCP Server v0.2.33
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Allow discovery of Product Documentation Library via the Search tool.

andito

posted an update 3 months ago

Post

1914

Finally, our new paper is out! "𝗙𝗶𝗻𝗲𝗩𝗶𝘀𝗶𝗼𝗻: 𝗢𝗽𝗲𝗻 𝗗𝗮𝘁𝗮 𝗜𝘀 𝗔𝗹𝗹 𝗬𝗼𝘂 𝗡𝗲𝗲𝗱"! 🥳
FineVision: Open Data Is All You Need (2510.17269)

If you've ever trained a VLM, you know this problem: nobody shares their data mixtures. It's a black box, making replicating SOTA work impossible.
We wanted to change that.

FineVision unifies 200 sources into 24 million samples. With 17.3 million images and 9.5 billion answer tokens, it's the largest open resource of its kind.

In the paper, we share how we built it:
🔍 finding and cleaning data at scale
🧹 removing excessive duplicates across sources
🤗 decontaminating against 66 public benchmarks

My favorite part is Figure 6 (in the video!). It's our visual diversity analysis. It shows that FineVision isn't just bigger; it's more balanced and conceptually richer than other open datasets.
NVIDIA's Eagle 2 paper highlighted just how critical this visual diversity is, and our results confirm it: models trained on FineVision consistently outperform those trained on any other open dataset on 11 benchmarks!

🎉 To celebrate the paper, I’m also releasing a concatenated and shuffled version of the full dataset! 👉HuggingFaceM4/FineVision_full_shuffled

It’s ready to stream, so you can start training your own models right away:

from datasets import load_dataset
d = load_dataset("HuggingFaceM4/FineVision_full_shuffled", split="train", streaming=True)
print(next(iter(d)))

A big shoutout to the first authors: Luis Wiedmann and Orr Zohar. They are rockstars!

merve

posted an update 3 months ago

Post

7513

deepseek-ai/DeepSeek-OCR is out! 🔥 my take ⤵️
> pretty insane it can parse and re-render charts in HTML
> it uses CLIP and SAM features concatenated, so better grounding
> very efficient per vision tokens/performance ratio
> covers 100 languages

4 replies

thomwolf

authored a paper 3 months ago

Robot Learning: A Tutorial

Paper • 2510.12403 • Published Oct 14, 2025 • 119

evalstate

posted an update 3 months ago

Post

262

Hugging Face MCP Server v0.2.31
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

- OpenAI Apps SDK Support for Gradio Content Generation spaces

lvwerra

authored a paper 3 months ago

BigCodeArena: Unveiling More Reliable Human Preferences in Code Generation via Execution

Paper • 2510.08697 • Published Oct 9, 2025 • 36

evalstate

posted an update 3 months ago

Post

347

Hugging Face MCP Server v0.2.29
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

- Document Fetch tool now returns markdown direct from HF Docs rather than processed.
- Relative URLs accepted for Fetch

abidlabs

posted an update 3 months ago

Post

1551

What other features would you like to see on the Trackio Dashboard? ( gradio-templates/trackio-dashboard)

merve

posted an update 3 months ago

Post

6810

large AI labs open-sourced a ton of models last week 🔥
here's few picks, find even more here merve/sep-16-releases-68d13ea4c547f02f95842f05 🤝
> IBM released a new Docling model with 258M params based on Granite (A2.0) 📝 ibm-granite/granite-docling-258M
> Xiaomi released 7B audio LM with base and instruct variants (MIT) XiaomiMiMo/mimo-audio-68cc7202692c27dae881cce0
> DecartAI released Lucy Edit, open Nano Banana 🍌 (NC) decart-ai/Lucy-Edit-Dev
> OpenGVLab released a family of agentic computer use models (3B/7B/32B) with the dataset 💻 OpenGVLab/scalecua-68c912cf56f7ff4c8e034003
> Meituan Longcat released thinking version of LongCat-Flash 💭 meituan-longcat/LongCat-Flash-Thinking

2 replies

merve

posted an update 4 months ago

Post

3392

IBM just released small swiss army knife for the document models: granite-docling-258M on Hugging Face 🔥

> not only a document converter but also can do document question answering, understand multiple languages 🤯
> best part: released with Apache 2.0 license 👏 use it with your commercial projects!
> it supports transformers, vLLM and MLX from the get-go! 🤗
> built on SigLIP2 & granite-165M

model: ibm-granite/granite-docling-258M
demo: ibm-granite/granite-docling-258m-demo 💗

AI & ML interests

Recent Activity

Team members 18

smolagents's activity