👋 Join us on Discord and WeChat

What's New

[2025.06.06] MiniCPM4 series are released! This model achieves ultimate efficiency improvements while maintaining optimal performance at the same scale! It can achieve over 5x generation acceleration on typical end-side chips! You can find technical report here.🔥🔥🔥

MiniCPM4 Series

MiniCPM4 series are highly efficient large language models (LLMs) designed explicitly for end-side devices, which achieves this efficiency through systematic innovation in four key dimensions: model architecture, training data, training algorithms, and inference systems.

MiniCPM4-8B: The flagship of MiniCPM4, with 8B parameters, trained on 8T tokens.
MiniCPM4-0.5B: The small version of MiniCPM4, with 0.5B parameters, trained on 1T tokens.
MiniCPM4-8B-Eagle-FRSpec: Eagle head for FRSpec, accelerating speculative inference for MiniCPM4-8B.
MiniCPM4-8B-Eagle-FRSpec-QAT-cpmcu: Eagle head trained with QAT for FRSpec, efficiently integrate speculation and quantization to achieve ultra acceleration for MiniCPM4-8B.
MiniCPM4-8B-Eagle-vLLM: Eagle head in vLLM format, accelerating speculative inference for MiniCPM4-8B.
MiniCPM4-8B-marlin-Eagle-vLLM: Quantized Eagle head for vLLM format, accelerating speculative inference for MiniCPM4-8B.
BitCPM4-0.5B: Extreme ternary quantization applied to MiniCPM4-0.5B compresses model parameters into ternary values, achieving a 90% reduction in bit width.
BitCPM4-1B: Extreme ternary quantization applied to MiniCPM3-1B compresses model parameters into ternary values, achieving a 90% reduction in bit width.
MiniCPM4-Survey: Based on MiniCPM4-8B, accepts users' quiries as input and autonomously generate trustworthy, long-form survey papers.
MiniCPM4-MCP: Based on MiniCPM4-8B, accepts users' queries and available MCP tools as input and autonomously calls relevant MCP tools to satisfy users' requirements. (<-- you are here)

Introduction

MiniCPM4-MCP is an open-source on-device LLM agent model jointly developed by THUNLP, Renmin University of China and ModelBest, built on MiniCPM-4 with 8 billion parameters. It is capable of solving a wide range of real-world tasks by interacting with various tool and data resources through MCP.

Usage

As of now, MiniCPM4-MCP supports the following:

Utilization of tools across 16 MCP servers: These servers span various categories, including office, lifestyle, communication, information, and work management.
Single-tool-calling capability: It can perform single- or multi-step tool calls using a single tool that complies with the MCP.
Cross-tool-calling capability: It can perform single- or multi-step tool calls using different tools that complies with the MCP.

Inference

MCP Servers Deployment

The MCP Servers supported by MiniCPM4-MCP include Airbnb, Amap-Maps, Arxiv-MCP-Server, Calculator, Computer-Control-MCP, Desktop-commander, Filesystem, Github, Gaode, MCP-Code-Executor, MCP-DOCx, PPT, PPTx, Simple-Time-Server, Slack, and Whisper. Follow the instructions provided in each server's repository for successful deployment. Note that not all tools in these servers will function properly in every environment. Some tools are unstable and may return errors such as timeouts or HTTP errors. During training data construction, tools with consistently high failure rates (e.g., those for which the LLM fails to produce a successful query even after hundreds of attempts) are filtered out.

MCP Client Setup

We modified the existing MCP Client from the mcp-cli repository to enable interaction between MiniCPM and MCP Servers.
After the MCP Client performs a handshake with a Server, it retrieves a list of available tools. An example of tool information contained in this list is provided in available_tool_example.json.

Once the available tools and user query are obtained, results can be generated using the following script logic:

python generate_example.py \
--tokenizer_path {path to MiniCPM4 tokenizer} \
--base_url {vllm deployment URL} \
--model {model name used in vllm deployment} \
--output_path {path to save results}

where the generate_example.py is located in link and MiniCPM4 generates tool calls in the following format:

    <|tool_call_start|>
    ```python 
    read_file(path="/path/to/file")
    ```
    <|tool_call_end|>

You can build a custom parser for MiniCPM4 tool calls based on this format. The relevant parsing logic is located in generate_example.py.

Since the mcp-cli repository supports the vLLM inference framework, MiniCPM4-MCP can also be integrated into mcp-cli by modifying vLLM accordingly.
Specifically, follow the instructions in this link to enable interaction between a client running the MiniCPM4-MCP model and the MCP Server.

Evaluation

The detailed evaluation script can be found on the GitHub page. The evaluation results are presented below.

MCP Server		gpt-4o			qwen3			minicpm4
	func	param	value	func	param	value	func	param	value
Airbnb	89.3	67.9	53.6	92.8	60.7	50.0	96.4	67.9	50.0
Amap-Maps	79.8	77.5	50.0	74.4	72.0	41.0	89.3	85.7	39.9
Arxiv-MCP-Server	85.7	85.7	85.7	81.8	54.5	50.0	57.1	57.1	52.4
Calculator	100.0	100.0	20.0	80.0	80.0	13.3	100.0	100.0	6.67
Computor-Control-MCP	90.0	90.0	90.0	90.0	90.0	90.0	90.0	90.0	86.7
Desktop-Commander	100.0	100.0	100.0	100.0	100.0	100.0	100.0	100.0	100.0
Filesystem	63.5	63.5	31.3	69.7	69.7	26.0	83.3	83.3	42.7
Github	92.0	80.0	58.0	80.5	50.0	27.7	62.8	25.7	17.1
Gaode	71.1	55.6	17.8	68.8	46.6	24.4	68.9	46.7	15.6
MCP-Code-Executor	85.0	80.0	70.0	80.0	80.0	70.0	90.0	90.0	65.0
MCP-Docx	95.8	86.7	67.1	94.9	81.6	60.1	95.1	86.6	76.1
PPT	72.6	49.8	40.9	85.9	50.7	37.5	91.2	72.1	56.7
PPTx	64.2	53.7	13.4	91.0	68.6	20.9	91.0	58.2	26.9
Simple-Time-Server	90.0	70.0	70.0	90.0	90.0	90.0	90.0	60.0	60.0
Slack	100.0	90.0	70.0	100.0	100.0	65.0	100.0	100.0	100.0
Whisper	90.0	90.0	90.0	90.0	90.0	90.0	90.0	90.0	30.0
Average	80.2	70.2	49.1	83.5	67.7	43.8	88.3	76.1	51.2

Statement

As a language model, MiniCPM generates content by learning from a vast amount of text.
However, it does not possess the ability to comprehend or express personal opinions or value judgments.
Any content generated by MiniCPM does not represent the viewpoints or positions of the model developers.
Therefore, when using content generated by MiniCPM, users should take full responsibility for evaluating and verifying it on their own.

LICENSE

This repository and MiniCPM models are released under the Apache-2.0 License.

Citation

Please cite our paper if you find our work valuable.

@article{minicpm4,
  title={{MiniCPM4}: Ultra-Efficient LLMs on End Devices},
  author={MiniCPM Team},
  year={2025}
}

Downloads last month: 95

Safetensors

Model size

8B params

Tensor type

BF16

Model tree for openbmb/MiniCPM4-MCP

Quantizations

4 models

Collection including openbmb/MiniCPM4-MCP

MiniCPM4

Collection

MiniCPM4: Ultra-Efficient LLMs on End Devices • 29 items • Updated Sep 8 • 77