👋 Join us on Discord and WeChat

What's New

[2025-06-05] 🚀🚀🚀 We have open-sourced MiniCPM4-Survey, a model built upon MiniCPM4-8B that is capable of generating trustworthy, long-form survey papers while maintaining competitive performance relative to significantly larger models.

MiniCPM4 Series

MiniCPM4 series are highly efficient large language models (LLMs) designed explicitly for end-side devices, which achieves this efficiency through systematic innovation in four key dimensions: model architecture, training data, training algorithms, and inference systems.

MiniCPM4-8B: The flagship of MiniCPM4, with 8B parameters, trained on 8T tokens.
MiniCPM4-0.5B: The small version of MiniCPM4, with 0.5B parameters, trained on 1T tokens.
MiniCPM4-8B-Eagle-FRSpec: Eagle head for FRSpec, accelerating speculative inference for MiniCPM4-8B.
MiniCPM4-8B-Eagle-FRSpec-QAT-cpmcu: Eagle head trained with QAT for FRSpec, efficiently integrate speculation and quantization to achieve ultra acceleration for MiniCPM4-8B.
MiniCPM4-8B-Eagle-vLLM: Eagle head in vLLM format, accelerating speculative inference for MiniCPM4-8B.
MiniCPM4-8B-marlin-Eagle-vLLM: Quantized Eagle head for vLLM format, accelerating speculative inference for MiniCPM4-8B.
BitCPM4-0.5B: Extreme ternary quantization applied to MiniCPM4-0.5B compresses model parameters into ternary values, achieving a 90% reduction in bit width.
BitCPM4-1B: Extreme ternary quantization applied to MiniCPM3-1B compresses model parameters into ternary values, achieving a 90% reduction in bit width.
MiniCPM4-Survey: Based on MiniCPM4-8B, accepts users' quiries as input and autonomously generate trustworthy, long-form survey papers. (<-- you are here)
MiniCPM4-MCP: Based on MiniCPM4-8B, accepts users' queries and available MCP tools as input and autonomously calls relevant MCP tools to satisfy users' requirements.

Overview

MiniCPM4-Survey is an open-source LLM agent model jointly developed by THUNLP, Renmin University of China and ModelBest. Built on MiniCPM4 with 8 billion parameters, it accepts users' quiries as input and autonomously generate trustworthy, long-form survey papers.

Key features include:

Plan-Retrieve-Write Survey Generation Framework — We propose a multi-agent generation framework, which operates through three core stages: planning (defining the overall structure of the survey), retrieval (generating appropriate retrieval keywords), and writing (synthesizing the retrieved information to generate coherent section-level content).
High-Quality Dataset Construction — We gather and process lots of expert-written survey papers to construct a high-quality training dataset. Meanwhile, we collect a large number of research papers to build a retrieval database.
Multi-Aspect Reward Design — We carefully design a reward system with three aspects (structure, content, and citations) to evaluate the quality of the surveys, which is used as the reward function in the RL training stage.
Multi-Step RL Training Strategy — We propose a Context Manager to ensure retention of essential information while facilitating efficient reasoning, and we construct Parallel Environment to maintain efficient RL training cycles.

Quick Start

Download the model

Download MiniCPM4-Survey from Hugging Face and place it in model/MiniCPM4-Survey. We recommend using MiniCPM-Embedding-Light as the embedding model, which can be downloaded from Hugging Face and placed in model/MiniCPM-Embedding-Light.

Perpare the environment

You can download the paper data from Kaggle, then extract it. You can run python dataset_process.py to process the data and generate the retrieval database. Then you can run python build_index.py to build the retrieval database.

cd ./code
curl -L -o ~/Downloads/arxiv.zip\
   https://www.kaggle.com/api/v1/datasets/download/Cornell-University/arxiv
unzip ~/Downloads/arxiv.zip -d .
mkdir data
python ./src/preprocess/dataset_process.py
mkdir index
python ./src/preprocess/build_index.py

Model Inference

You can run the following command to build the retrieval environment and start the inference:

cd ./code
python ./src/retriever.py
bash ./scripts/run.sh

If you want to run with the frontend, you can run the following command:

cd ./code
python ./src/retriever.py
bash ./scripts/run_with_frontend.sh
cd frontend/minicpm4-survey
npm install
npm run dev

Then you can visit http://localhost:5173 in your browser to use the model.

Performance Evaluation

Method	Relevance	Coverage	Depth	Novelty	Avg.	Fact Score
Naive RAG (driven by G2FT)	3.25	2.95	3.35	2.60	3.04	43.68
AutoSurvey (driven by G2FT)	3.10	3.25	3.15	3.15	3.16	46.56
Webthinker (driven by WTR1-7B)	3.30	3.00	2.75	2.50	2.89	--
Webthinker (driven by QwQ-32B)	3.40	3.30	3.30	2.50	3.13	--
OpenAI Deep Research (driven by GPT-4o)	3.50	3.95	3.55	3.00	3.50	--
MiniCPM4-Survey	3.45	3.70	3.85	3.00	3.50	68.73
w/o RL	3.55	3.35	3.30	2.25	3.11	50.24

Performance comparison of the survey generation systems. "G2FT" stands for Gemini-2.0-Flash-Thinking, and "WTR1-7B" denotes Webthinker-R1-7B. FactScore evaluation was omitted for Webthinker, as it does not include citation functionality, and for OpenAI Deep Research, which does not provide citations when exporting the results.

Statement

As a language model, MiniCPM generates content by learning from a vast amount of text.
However, it does not possess the ability to comprehend or express personal opinions or value judgments.
Any content generated by MiniCPM does not represent the viewpoints or positions of the model developers.
Therefore, when using content generated by MiniCPM, users should take full responsibility for evaluating and verifying it on their own.

LICENSE

This repository and MiniCPM models are released under the Apache-2.0 License.

Citation

Please cite our paper if you find our work valuable.

@article{minicpm4,
  title={{MiniCPM4}: Ultra-Efficient LLMs on End Devices},
  author={MiniCPM Team},
  year={2025}
}

中文

News

[2025-06-05] 🚀🚀🚀我们开源了基于MiniCPM4-8B构建的MiniCPM4-Survey，能够生成可信的长篇调查报告，性能比肩更大模型。

概览

MiniCPM4-Survey是由THUNLP、中国人民大学和ModelBest联合开发的开源大语言模型智能体。它基于MiniCPM4 80亿参数基座模型，接受用户质量作为输入，自主生成可信的长篇综述论文。

主要特性包括：

计划-检索-写作生成框架 — 我们提出了一个多智能体生成框架，包含三个核心阶段：计划（定义综述的整体结构）、检索（生成合适的检索关键词）和写作（利用检索到的信息，生成连贯的段落）。
高质量数据集构建——我们收集并处理大量人类专家写作的综述论文，构建高质量训练集。同时，我们收集大量研究论文，构建检索数据库。
多方面奖励设计 — 我们精心设计了包含结构、内容和引用的奖励，用于评估综述的质量，在强化学习训练阶段作奖励函数。
多步强化学习训练策略 — 我们提出了一个上下文管理器，以确保在促进有效推理的同时保留必要的信息，并构建了并行环境，维持强化学习训练高效。

使用

下载模型

从 Hugging Face 下载MiniCPM4-Survey并将其放在model/MiniCPM4-Survey中。我们建议使用MiniCPM-Embedding-Light作为表征模型，放在model/MiniCPM-Embedding-Light中。

准备环境

从 Kaggle 下载论文数据，然后解压。运行python dataset_process.py，处理数据并生成检索数据库。然后运行python build_index.py，构建检索数据库。

cd ./code
curl -L -o ~/Downloads/arxiv.zip\
   https://www.kaggle.com/api/v1/datasets/download/Cornell-University/arxiv
unzip ~/Downloads/arxiv.zip -d .
mkdir data
python ./src/preprocess/dataset_process.py
mkdir index
python ./src/preprocess/build_index.py

模型推理

运行以下命令来构建检索环境并开始推理：

cd ./code
python ./src/retriever.py
bash ./scripts/run.sh

如果您想使用前端运行，可以运行以下命令：

cd ./code
python ./src/retriever.py
bash ./scripts/run_with_frontend.sh
cd frontend/minicpm4-survey
npm install
npm run dev

然后你可以在浏览器中访问http://localhost:5173使用。

性能

Method	Relevance	Coverage	Depth	Novelty	Avg.	Fact Score
Naive RAG (driven by G2FT)	3.25	2.95	3.35	2.60	3.04	43.68
AutoSurvey (driven by G2FT)	3.10	3.25	3.15	3.15	3.16	46.56
Webthinker (driven by WTR1-7B)	3.30	3.00	2.75	2.50	2.89	--
Webthinker (driven by QwQ-32B)	3.40	3.30	3.30	2.50	3.13	--
OpenAI Deep Research (driven by GPT-4o)	3.50	3.95	3.55	3.00	3.50	--
MiniCPM4-Survey	3.45	3.70	3.85	3.00	3.50	68.73
w/o RL	3.55	3.35	3.30	2.25	3.11	50.24

GPT-4o对综述生成系统的性能比较。“G2FT”代表Gemini-2.0-Flash-Thinking，“WTR1-7B”代表Webthinker-R1-7B。由于Webthinker不包括引用功能，OpenAI Deep Research在导出结果时不提供引用，因此省略了对它们的FactScore评估。我们的技术报告中包含评测的详细信息。

openbmb
/

MiniCPM4-Survey