I've made some improvements to my custom Deep_Research tool in the Nymbo/Tools MCP server. I've added a second LLM process and it still takes less than 1 minute to complete!
The original version of my Deep_Research tool would basically dump up to 50 fetched webpages onto the Researcher model (Qwen3-235B), with only a little bit of context shown from each page.
# New "Filterer" Process
The new process includes another LLM call before the researcher process. The Filterer (also Qwen3-235B) gets the query summary and the original 50 pages with low context, and decides which pages are most relevant to the research topic. The Filterer then outputs the URLs to the relevant pages, which are then re-fetched (with more context) and sent to the Researcher.
# Researcher Context
The Researcher now gets only the relevant webpages, then begins writing the report. When testing with 50 initial results, the researcher would often end up with 10-20 results of relevant context.
It still takes less than a minute to accomplish everything, thanks entirely to Cerebras inference. It now takes about 35-45 seconds to complete once the tool is run.
It's also worth noting that both the Filterer and Researcher now are provided the current time/date before they see the content, reducing hallucinations caused by knowledge cutoffs.
There's now a custom Deep_Research tool in my Nymbo/Tools MCP server! TL;DR: The agent using the tools writes a summary of your requests and up to five DuckDuckGo searches (up to 50 results). Each of the webpages found in the searches are then fetched and given to our researcher (Qwen3-235B-A22B-Thinking-2507). The researcher sees the summary, searched queries, and fetched links, then writes a thorough research report. The agent using the tool provides the user with a summary of the report and a link to download research_report.txt. The researcher's instructions are similar to some leaked Perplexity sys prompts.
# Deep_Research Tool
It accomplishes everything in under a minute so it doesn't hit MCP's 60 second timeout, mostly thanks to Cerebras. The only thing required to make this work is a HF_READ_TOKEN for inference.
The Deep_Research tool could certainly be improved. It still needs some sort of mechanism for sorting URLs based on importance (I've got some ideas but I don't want it to be the responsibility of the agent using the tool). I'll probably add a second researcher to filter out the bad sources before inferencing the big researcher. I'm hellbent on keeping this all within the scope of one tool call.
# More Fetch/Web Search Improvements
The Search_DuckDuckGo tool has been further enhanced. It now allows the agent to browse through all pages of results. The results also now include published date (if detected). It also now supports every DDG search types! Default DDG search is called text, but it can also now search by news, images, videos, and books.
The Fetch_Webpage tool now specifies how much of the page has been truncated, and cursor index, allowing it to pickup where it left off without re-consuming tokens. The model can now also choose to strip CSS selectors to remove excess noise, and there's a new URL Scraper mode that only returns URLs found on the full page.
I have a few updates to my MCP server I wanna share: New Memory tool, improvements to web search & speech generation.
# Memory_Manager Tool
We now have a Memory_Manager tool. Ask ChatGPT to write all its memories verbatim, then tell gpt-oss-20b to save each one using the tool, then take them anywhere! It stores memories in a memories.json file in the repo, no external database required.
The Memory_Manager tool is currently hidden from the HF space because it's intended for local use. It's enabled by providing a HF_READ_TOKEN in the env secrets, although it doesn't actually use the key for anything. There's probably a cleaner way of ensuring memory is only used locally, I'll come back to this.
# Fetch & Websearch
The Fetch_Webpage tool has been simplified a lot. It now converts the page to Markdown and returns the page with three length settings (Brief, Standard, Full). This is a lot more reliable than the old custom extraction method.
The Search_DuckDuckGo tool has a few small improvements. The input is easier for small models to get right, and the output is more readable.
# Speech Generation
I've added the remaining voices for Kokoro-82M, it now supports all 54 voices with all accents/languages.
I also removed the 30 second cap by making sure it computes all chunks in sequence, not just the first. I've tested it on outputs that are ~10 minutes long. Do note that when used as an MCP server, the tool will timeout after 1 minute, nothing I can do about that for right now.
# Other Thoughts
Lots of MCP use cases involve manipulating media (image editing, ASR, etc.). I've avoided adding tools like this so far for two reasons:
1. Most of these solutions would require assigning it a ZeroGPU slot. 2. The current process of uploading files like images to a Gradio space is still a bit rough. It's doable but requires additional tools.
Both of these points make it a bit painful for local usage. I'm open to suggestions for other tools that rely on text.
Did you know you can use AI-Toolkit by Ostris (https://github.com/ostris/ai-toolkit) to train AI image and video models directly inside a Hugging Face space?
The benefit of doing so is that you will get a nice UI, if you do not want to deal with JSON files and CLI shenanigans!
I have created a ready-to-use template you can deploy to your own HF Space to train generative models in a few clicks.
All you have to do is to duplicate my space to your private space by going here: jbilcke-hf/ai-toolkit
This space requires a good GPU and most importantly a persistent storage, as everything is stored in /data.
I built a general use MCP space ~ Fetch webpages, DuckDuckGo search, Python code execution, Kokoro TTS, Image Gen, Video Gen.
# Tools
1. Fetch webpage 2. Web search via DuckDuckGo (very concise, low excess context) 3. Python code executor 4. Kokoro-82M speech generation 5. Image Generation (use any model from HF Inference Providers) 6. Video Generation (use any model from HF Inference Providers)
The first four tools can be used without any API keys whatsoever. DDG search is free and the code execution and speech gen is done on CPU. Having a HF_READ_TOKEN in the env variables will show all tools. If there isn't a key present, The Image/Video Gen tools are hidden.
Very impressed with this lil guy and it deserves more downloads. It's based on the original version of Qwen3-4B but find that it questions reality way less often. Jan-v1 seems to think that everything it sees is synthetic data and constantly gaslights me
Anyone know how to reset Claude web's MCP config? I connected mine when the HF MCP first released with just the default example spaces added. I added lots of other MCP spaces but Claude.ai doesn't update the available tools... "Disconnecting" the HF integration does nothing, deleting it and adding it again does nothing.
Refreshing tools works fine in VS Code because I can manually restart it in mcp.json, but claude.ai has no such option. Anyone got any ideas?
Open Source Avengers, Assemble! Ask an expert AI agent team to solve complex problems together 🔥
Consilium brings together multiple agents that debate and use live research (web, arXiv, SEC) to reach a consensus. You set the strategy, they find the answer.
Note : some things like persistent models/storage/custom LoRAs might not be fully working out of the box. If you need those, you might have to dig into the Wan2GP codebase, see how to tweak the storage folder. Happy hacking!
PSA for anyone using Nymbo/Nymbo_Theme or Nymbo/Nymbo_Theme_5 in a Gradio space ~
Both of these themes have been updated to fix some of the long-standing inconsistencies ever since the transition to Gradio v5. Textboxes are no longer bright green and in-line code is readable now! Both themes are now visually identical across versions.
If your space is already using one of these themes, you just need to restart your space to get the latest version. No code changes needed.
Explain like i'm 5 the last take from @thomwolf on X about Dario's essay on DeepSeek:
—› Open-source AI is like a big cookbook that everyone can read and improve. Instead of a few chefs keeping their recipes secret, anyone can cook, test, and invent new things.
If only one company controls AI, everything stops if they have a problem—like when the internet goes down. With open-source, many people can help, making sure it keeps running smoothly.
AI isn’t just a race between two countries; it’s a team effort around the world. By sharing, we move faster and create safer technology for everyone. — 🤗