Eduhelp with more empathy, based on model finetuned on psychotheraputic preferences just landed on
Beck-8B as a base model, 13000 steps on educational dataset. Time to go further and build more 🥰 s3nh/EduHelp_Beck_8B Thanks to @basilic_ai for computations <3
Just tried to create an educational assistant for younger people who can struggle with visualsation of 'what is this sorcery all about'. Its first step of my spare time projects, sft on Qwen3-8B,
EduHelper is a child-friendly tutoring assistant fine-tuned from the Qwen3-8B base model using parameter-efficient fine-tuning (PEFT) with LoRA on the ajibawa-2023/Education-Young-Children dataset.
🎮 Live Model Demo: Upload an Android Screenshot and instructions to see the model in action ! Tonic/l-operator-demo
Built in a garage, funded by pre-orders, no VC. Now we’re scaling to 1 k installer units.
We’re giving 50 limited-edition prototypes to investors , installers & researchers who want to co-design the sovereign smart home.
👇 Drop “EUSKERA” in the comments if you want an invite, tag a friend who still thinks Alexa is “convenient,” and smash ♥️ if AI should belong to people - not servers.
a senior engineer at google just dropped a 400-page free book on docs for review: agentic design patterns.
the table of contents looks like everything you need to know about agents + code: > advanced prompt techniques > multi-agent patterns > tool use and MCP > you name it
Just wanted to annouce 🏭SmolFactory : it's the quickest and best way to finetune SmolLM3 and GPT-OSS-20B on huggingface !
Basicaly it's an app you can run on huggingface by duplicating the space and running your training directly on huggingface GPUs .
It will help you basically select datasets and models, fine tune your model , make an experiment tracker you can use on your mobile phone , push all your model card and even automatically make a demo for you on huggingface so you can directly test it out when it's done !
longer context doesn't generate better responses. it can even hurt your llm/agent. 1M context window doesn't automatically make models smarter as it's not about the size; it's how you use it.
here are 4 types of context failure and why each one happens:
1. context poisoning: if hallucination finds its way into your context, the agent will rely on that false information to make its future moves. for example if the agent hallucinates about the "task description", all of its planning to solve the task would also be corrupt.
2. context distraction: when the context becomes too bloated, the model focuses too much on it rather than come up with novel ideas or to follow what it has learned during training. as Gemini 2.5 Pro technical report points out, as context grows significantly from 100K tokens, "the agent showed a tendency toward favoring repeating actions from its vast history rather than synthesizing novel plans".
3. context confusion: everyone lost it when MCPs became popular, it seemed like AGI was achieved. I suspected there is something wrong and there was: it's not just about providing tools, bloating the context with tool use derails the model from selecting the right one! even if you can fit all your tool metadata in the context, as their number grows, the model gets confused over which one to pick.
4. Context Clash: if you exchange conversation with a model step by step and provide information as you go along, chances are you get worse performance rather than providing all the useful information at once. one the model's context fills with wrong information, it's more difficult to guide it to embrace the right info. agents pull information from tools, documents, user queries, etc. and there is a chance that some of these information contradict each other, and it's not good new for agentic applications.
just submitted my plugin idea to the G-Assist Plugin Hackathon by @nvidia . Check it out, it's a great way to use a local SLA model on a windows machine to easily and locally get things done ! https://github.com/NVIDIA/G-Assist
So every bio/med/chem meeting i go to i always the same questions "why are you sharing a gdrive link with me for this?" and "Do you have any plans to publish your model weights and datasets on huggingface?" and finally i got a good answer today which explains everything :
basically there is some kind of government censorship on this (usa, but i'm sure others too) and they are told they are not allowed as it is considered a "dataleak" which is illegal !!!!
this is terrible ! but the good news is that we can do something about it !
60+ Generative AI projects for your resume. grind this GitHub repo if you want to level up: > LLM fine-tuning and applications > advanced RAG apps > Agentic AI projects > MCP and A2A (new)
this book actually exists for free, “the little book of deep learning”. best to refresh your mind about DL basics: > foundations of machine learning > how models train > common layers (dropout, pooling…) > basic intro to LLMs actually optimized for mobile.
The best researchers from DeepSeek, OpenAI, Microsoft, and ByteDance explored RL and Reasoning in LLMs,
Here's some of their key findings:
1/ RL can further improve distilled models. These models are essentially SFT fine-tuned with the data generated by larger models, and the SFT+RL combo does not disappoint.
This is verified in the DeepSeek-R1 paper.
2/ both GRPO and PPO algorithms suffer from length bias; they encourage longer responses. This can be tackled by introducing explicit rewards based on the length of the answer.
3/Most reasoning research is focused on code and math. But training models on logic puzzles improves them for mathematical tasks too.
This shows the RL reasoning is generalized beyond the specific domain knowledge.
Previous research also shows RL can be a great generalizer.
4/The reasoning might not be only induced by RL; it might already be hidden in the base models due to the pre-training and CoT data they were trained on.
So while RL does wake up the reasoning beast, maybe it's not the only solution (e.g. other methods such as distillation)
5/ back to the length bias; reasoning models tend to generate longer responses for wrong answers. RL might be the culprit.
RL favours longer answers when the reward is negative, to dilute the penalty per individual token and lower the loss.
This might explain the "aha" moments!
6/ OpenAI's competitive programming paper showed an interesting finding:
o3 can learn its own test-time strategies (like writing an inefficient but correct solution to verify the answer of an optimized solution)
RL helps LLMs develop their own reasoning & verification methods. The recent article by @rasbt helped me a lot in getting a broad view of the recent research on reasoning models.
He also lists more influential papers on this topic, It's a must-read if you're interested.