TencentARC
/

MetaMath-Mistral-Pro

Text Generation

text-generation-inference

Model card Files Files and versions Community

WuChengyue commited on Feb 27, 2024

Commit

bd4b87a

·

verified ·

1 Parent(s): 366a05d

Update README.md

Files changed (1) hide show

README.md +82 -0

README.md CHANGED Viewed

@@ -1,3 +1,85 @@
 ---
 license: apache-2.0
 ---

 ---
 license: apache-2.0
+datasets:
+- meta-math/MetaMathQA
+language:
+- en
+metrics:
+- accuracy
 ---
+---
+license: apache-2.0
+datasets:
+- meta-math/MetaMathQA
+---
+see our paper in https://arxiv.org/abs/2401.02415
+View the project page:
+https://github.com/TencentARC/LLaMA-Pro
+## Model Details
+MetaMath-Mistral-Pro is fully fine-tuned on the MetaMathQA datasets and based on the powerful Mistral-Pro model.
+## Model Usage
+prompting template:
+'''
+"Below is an instruction that describes a task. "
+"Write a response that appropriately completes the request.\n\n"
+"### Instruction:\n{instruction}\n\n### Response: Let's think step by step."
+'''
+where you need to use your query question to replace the {instruction}
+## Experiments
+| Model               | GSM8k Pass@1 | MATH Pass@1 |
+|---------------------|--------------|-------------|
+| MPT-7B              | 6.8          | 3.0         |
+| Falcon-7B           | 6.8          | 2.3         |
+| LLaMA-1-7B          | 11.0         | 2.9         |
+| LLaMA-2-7B          | 14.6         | 2.5         |
+| MPT-30B             | 15.2         | 3.1         |
+| LLaMA-1-13B         | 17.8         | 3.9         |
+| GPT-Neo-2.7B        | 19.5         | --          |
+| Falcon-40B          | 19.6         | 2.5         |
+| Baichuan-chat-13B   | 23.9         | --          |
+| Vicuna-v1.3-13B     | 27.6         | --          |
+| LLaMA-2-13B         | 28.7         | 3.9         |
+| InternLM-7B         | 31.2         | --          |
+| ChatGLM-2-6B        | 32.4         | --          |
+| GPT-J-6B            | 34.9         | --          |
+| LLaMA-1-33B         | 35.6         | 3.9         |
+| LLaMA-2-34B         | 42.2         | 6.24        |
+| RFT-7B              | 50.3         | --          |
+| LLaMA-1-65B         | 50.9         | 10.6        |
+| Qwen-7B             | 51.6         | --          |
+| WizardMath-7B       | 54.9         | 10.7        |
+| LLaMA-2-70B         | 56.8         | 13.5        |
+| WizardMath-13B      | 63.9         | 14.0        |
+| MAmmoTH-7B (COT)    | 50.5         | 10.4        |
+| MAmmoTH-7B (POT+COT)| 53.6         | 31.5        |
+| Arithmo-Mistral-7B  | 74.7         | 25.3        |
+| MetaMath-7B         | 66.5         | 19.8        |
+| MetaMath-13B        | 72.3         | 22.4        |
+| MetaMath-Mistral-7B | 77.7     | 28.2        |
+|  MetaMath-Llemma-7B | 69.2     | 30.0        |
+| 🔥 **MetaMath-Mistral-Pro** | **78.4**     | **30.3**        |
+## Citation
+```bibtex
+@article{wu2024llama,
+  title={Llama pro: Progressive llama with block expansion},
+  author={Wu, Chengyue and Gan, Yukang and Ge, Yixiao and Lu, Zeyu and Wang, Jiahao and Feng, Ye and Luo, Ping and Shan, Ying},
+  journal={arXiv preprint arXiv:2401.02415},
+  year={2024}
+}
+```