arxiv:2510.04871

Less is More: Recursive Reasoning with Tiny Networks

Published on Oct 6

· Submitted by

Alexia Jolicoeur-Martineau on Oct 8

#1 Paper of the day

Samsung SAIT AI Lab, Montreal

Upvote

391

Authors:

Alexia Jolicoeur-Martineau

Abstract

Tiny Recursive Model (TRM) achieves high generalization on complex puzzle tasks using a small, two-layer network with minimal parameters, outperforming larger language models.

AI-generated summary

Hierarchical Reasoning Model (HRM) is a novel approach using two small neural networks recursing at different frequencies. This biologically inspired method beats Large Language models (LLMs) on hard puzzle tasks such as Sudoku, Maze, and ARC-AGI while trained with small models (27M parameters) on small data (around 1000 examples). HRM holds great promise for solving hard problems with small networks, but it is not yet well understood and may be suboptimal. We propose Tiny Recursive Model (TRM), a much simpler recursive reasoning approach that achieves significantly higher generalization than HRM, while using a single tiny network with only 2 layers. With only 7M parameters, TRM obtains 45% test-accuracy on ARC-AGI-1 and 8% on ARC-AGI-2, higher than most LLMs (e.g., Deepseek R1, o3-mini, Gemini 2.5 Pro) with less than 0.01% of the parameters.

View arXiv page View PDF Project page GitHub 4.49k Add to collection

Community

AlexiaJM

Paper author Paper submitter 8 days ago

Less is More

GVR

8 days ago

Have any checkpoints been made available by the authors or anyone else?

clem

8 days ago

not yet from what I've seen. cc @AlexiaJM

arceofrancisco

8 days ago

This is awesome!!!

Any chance you can link the repository with the code and dataset?

Valtrab

7 days ago

•

edited 7 days ago

https://github.com/SamsungSAILMontreal/TinyRecursiveModels
but it is tagged on the page already

tssst

8 days ago

it seems to perform very well on task tuning, eg. the sudoku... even if its not agi or anything like that, this could be revolutionary for small, domain specific tasks

include

7 days ago

would love to see this 'issue' solved: https://github.com/SamsungSAILMontreal/TinyRecursiveModels/issues/2

;)

dcaustin33

7 days ago

•

edited 7 days ago

How do normal supervised learning methods perform on these type of tasks. I am confused what differentiates this vs a larger forward pass. Or is simply the learning efficiency?

kudosscience

7 days ago

This sentence in the Conclusion implies this architecture is superior to a massive net without recursive reasoning:

the question of why recursion helps so much compared to using a larger and deeper network remains to be explained; we suspect it has to do with overfitting, but we have no theory to back this explaination (sic)

mindplay

5 days ago

I don't understand this new trend of comparing models like HRM and TRM with LLMs?

How is that relevant? They're not LLMs. The term "reasoning" has nothing to with reasoning in LLMs. I don't even think these techniques are applicable to LLMs, are they?

Like, of course a specialized model, trained for a specific task, is going to perform better than an LLM trained for an entirely different class of problems, right?

For that matter, how is it relevant to test these models on ARC-AGI, which is a benchmark to evaluate the problem solving capabilities of LLMs?

It's apples to oranges, isn't it? Jet airplanes to weather balloons? The weather balloon is obviously way better at monitoring the weather, but jet airplanes have quite a few more uses.

Compare these models against other specialized models: are they significantly smaller or faster?

Give us data we can actually compare.

I honestly have no idea if there's anything truly novel about these models, because you haven't provided any relevant comparison to anything remotely similar. 🤷‍♂️

kudosscience

5 days ago

ARC-AGI are benchmarks to evaluate the fluid intelligence of any AI - not only LLMs

The fact that TRMs perform better than any known architecture on those benchmarks is interesting on its own.

browniepoints

5 days ago

•

edited 5 days ago

I guess no one grasp the larger truth. This style of recursive style of reasoning is in reality belief-state engineering. This actualizes this on an architectural basis-- although you can achieve something similar with an extra encoder in normal decoder-only LM's during training. Cool paper. Hope to see more like this, and hope people extrapolate belief-state engineering to other research facets, one day it will replace RL.

kudosscience

5 days ago

Thanks, I'd never heard of belief-state engineering.
I found this definition from:
Hu, E.S., Ahn, K., Liu, Q., Xu, H., Tomar, M., Langford, A., Jayaraman, D., Lamb, A. and Langford, J., 2024. Learning to achieve goals with belief state transformers. arXiv e-prints, pp.arXiv-2410. https://doi.org/10.48550/arXiv.2410.23506

Informally, a belief state is a sufficient amount of information from the past to predict
the outcomes of all experiments in the future, which can be expressed as either a distribution over
underlying world states or a distribution over future outcomes.

TPM-28

5 days ago

TRM's comparison to LLMs conflates two separate issues. First, the training regime: TRM uses 1000x data augmentation per example, about 1M effective samples, while Gemini and DeepSeek are zero-shot on ARC-AGI. A Transformer trained the same way would perform similarly. This isn't an architecture advantage, it's a data advantage. Second, the task structure mismatch is fundamental. ARC-AGI has deterministic solutions in fixed-dimensional spaces. LLMs generate variable-length token sequences with discrete vocabulary constraints. Recursive latent refinement and autoregressive token generation operate in completely different optimization spaces. The recursive approach is conceptually interesting for LLMs (Meta's Coconut explores this), but you can't benchmark it fairly on ARC-AGI. A proper test would integrate TRM's latent recursion into an LLM's hidden states and measure actual language task performance, not geometric puzzle accuracy.

prabinrath

4 days ago

•

edited 4 days ago

That completely changes the perception here. Data augmentation using domain rules for example, in the case of Sudoku would basically result in an infinite supply of training data. Such level of augmentation is impossible for most practical datasets from real world.

ruiheCat

4 days ago

Very interesting work and the first question popped in my mind when I see recursive is speed. How fast does each question get processed compared to the other models compared in this paper?

hug4evar

4 days ago

This on a tiny, extremely efficient node will make home AI systems possible for everyone. No need for a huge server rack in the basement.

sr9074

3 days ago

•

edited 3 days ago

Hi, I love the work @AlexiaJM . Have you considered adding register tokens? I think register tokens might give some improvement because it alleviates attention noise and outliers. Since it's recursive depth, I think alleviating attention noise and outliers might lead to more stability. Wondering if this has been tested or tried before.