Thinking with Generated Images

thinking-with-generated-images

We introduce Thinking with Generated Images, where we enable a single LMM (Large Multimodal Model) to spontaneously generate and reason with intermediate visual thoughts via a native long-multimodal thought process.

thinking-with-generated-images

This model supports vision generation with intermediate visual subgoals.

thinking-with-generated-images

Please refer to our github repo for more information!

Downloads last month
5
Safetensors
Model size
7B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support