--- base_model: - OpenGVLab/Mini-InternVL-Chat-2B-V1-5 language: - en library_name: transformers license: apache-2.0 metrics: - accuracy pipeline_tag: image-text-to-text --- # MathCoder-VL: Bridging Vision and Code for Enhanced Multimodal Mathematical Reasoning Repo: [https://github.com/mathllm/MathCoder](https://github.com/mathllm/MathCoder) Paper: [https://huggingface.co/papers/2505.10557](https://huggingface.co/papers/2505.10557) ## Introduction We introduce MathCoder-VL, a series of open-source large multimodal models (LMMs) specifically tailored for general math problem-solving. We also introduce [FigCodifier-8B](https://huggingface.co/MathLLMs/FigCodifier), an image-to-code model. | Base Model |Ours | |-------------------------------------------------------------------|-----------------------------------------------------------------------| | [Mini-InternVL-Chat-2B-V1-5](https://huggingface.co/OpenGVLab/Mini-InternVL-Chat-2B-V1-5) | [MathCoder-VL-2B](https://huggingface.co/MathLLMs/MathCoder-VL-2B) | | [InternVL2-8B](https://huggingface.co/OpenGVLab/InternVL2-8B) | [MathCoder-VL-8B](https://huggingface.co/MathLLMs/MathCoder-VL-8B)| ## Usage For training and inference code, please refer to [InternVL](https://github.com/OpenGVLab/InternVL). **Example:** (Illustrative - adapt to your specific needs and refer to InternVL for details) ```python from transformers import pipeline pipe = pipeline("image-text-to-text", model="MathLLMs/MathCoder-VL-2B", device=0) #replace with your preferred model and device image = "path/to/your/image.png" #replace with your image path prompt = "What is the area of the shape in this image?" result = pipe(image=image, text=prompt) print(result) ``` ## Motivation