BAGEL-Canvas

Paper PDF          Project Page          GitHub Code

πŸ“– Overview

BAGEL-Canvas is a powerful Unified Large Multimodal Model (ULMM) endowed with intrinsic Visual Chain-of-Thought (VCoT) capabilities for complex mathematical reasoning. It is the flagship model trained using the comprehensive [MathCanvas] framework.

Unlike prior models that often fail by generating incorrect (e.g., BAGEL-Zebra-CoT) or strategically poor (e.g., Nano-Banana) visuals, BAGEL-Canvas learns to strategically generate and reason with diagrams. As shown below, it correctly produces an intermediate visual step that simplifies the problem, leading to an elegant and correct solution path, mirroring human-like problem-solving.

Comparison of different models on a geometry problem. BAGEL-Canvas ("Ours") is the only model that generates a correct and strategically useful diagram to solve the problem.

πŸš€ Training Recipe

BAGEL-Canvas is trained following the two-stage MathCanvas framework, designed to systematically build its visual reasoning abilities.

  1. Stage I: Visual Manipulation (Pre-training) This foundational stage trains the model to master the core skills of diagram generation and editing. It involves pre-training on a massive 15.2M-pair corpus, which includes:

    • MathCanvas-Imagen (10M pairs): Teaches text-to-diagram generation.
    • MathCanvas-Edit (5.2M pairs): Teaches step-by-step diagram editing and manipulation.
  2. Stage II: Strategic Visual-Aided Reasoning (Fine-tuning) In this stage, the model learns when and how to strategically deploy its visual skills to solve problems. It is fine-tuned on MathCanvas-Instruct, a high-quality dataset of 219K examples featuring interleaved visual-textual reasoning paths, teaching it when and how to generate a complete VCoT solution.

πŸ† Performance

BAGEL-Canvas demonstrates a significant leap in visual mathematical reasoning.

On the challenging [MathCanvas-Bench], our model achieves an 86% relative improvement over strong baselines, establishing state-of-the-art performance among open-source models.

For detailed results and comparisons with over 20 other leading models, please refer to the official leaderboard.

➑️ View the Official MathCanvas-Bench Leaderboard

πŸ“œ Citation

If you find our work useful, please consider citing us!

@misc{shi2025mathcanvasintrinsicvisualchainofthought,
      title={MathCanvas: Intrinsic Visual Chain-of-Thought for Multimodal Mathematical Reasoning}, 
      author={Weikang Shi and Aldrich Yu and Rongyao Fang and Houxing Ren and Ke Wang and Aojun Zhou and Changyao Tian and Xinyu Fu and Yuxuan Hu and Zimu Lu and Linjiang Huang and Si Liu and Rui Liu and Hongsheng Li},
      year={2025},
      eprint={2510.14958},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2510.14958}, 
}
Downloads last month
55
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Collection including shiwk24/BAGEL-Canvas