--- license: apache-2.0 library_name: transformers pipeline_text: text-generation ---
Home Page    |    Technical Report    |    Base Model    |    Chat Model    |    Instruct Model    |    Reasoning Model    |    VLM Model
## Model You can download our base 7B model from this [link](https://huggingface.co/moxin-org/moxin-llm-7b) and our chat 7B model from this [link](https://huggingface.co/moxin-org/moxin-chat-7b). ## Inference You can use the following code to run inference with the model. The model is saved under './model/' directory. Change the model directory accordingly or use the Huggingface link. ``` import torch from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline torch.backends.cuda.enable_mem_efficient_sdp(False) torch.backends.cuda.enable_flash_sdp(False) model_name = 'moxin-org/Moxin-7B-LLM' tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True, ) pipe = pipeline( "text-generation", model=model, tokenizer = tokenizer, torch_dtype=torch.bfloat16, device_map="auto" ) prompt = "Can you explain the concept of regularization in machine learning?" sequences = pipe( prompt, do_sample=True, max_new_tokens=1000, temperature=0.7, top_k=50, top_p=0.95, num_return_sequences=1, ) print(sequences[0]['generated_text']) ``` ## Evaluation We test the performance of our model with [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness). The evaluation results on common datasets are shown below. We test on AI2 Reasoning Challenge (25-shot), HellaSwag (10-shot), MMLU (5-shot), and Winogrande (5-shot). We release the Moxin-7B-finetuned as our base model. We further finetune our base model on Tulu v2 to obtain our chat model. | Models | ARC-C | Hellaswag | MMLU | WinoGrade | Ave | |:----------------------:|:-----:|:---------:|:-----:|:---------:|:-----:| | Mistral-7B | 57.59 | 83.25 | 62.42 | 78.77 | 70.51 | | LLaMA 3.1-8B | 54.61 | 81.95 | 65.16 | 77.35 | 69.77 | | LLaMA 3-8B | 55.46 | 82.09 | 65.29 | 77.82 | 70.17 | | LLaMA 2-7B | 49.74 | 78.94 | 45.89 | 74.27 | 62.21 | | Qwen 2-7B | 57.68 | 80.76 | 70.42 | 77.43 | 71.57 | | gemma-7b | 56.48 | 82.31 | 63.02 | 78.3 | 70.03 | | internlm2.5-7b | 54.78 | 79.7 | 68.17 | 80.9 | 70.89 | | Baichuan2-7B | 47.87 | 73.89 | 54.13 | 70.8 | 61.67 | | Yi-1.5-9B | 58.36 | 80.36 | 69.54 | 77.53 | 71.48 | | Moxin-7B-original | 53.75 | 75.46 | 59.43 | 70.32 | 64.74 | | Moxin-7B-finetuned | 59.47 | 83.08 | 60.97 | 78.69 | 70.55 | We also test the zero shot performance on AI2 Reasoning Challenge (0-shot), AI2 Reasoning Easy (0-shot), HellaSwag (0-shot), PIQA (0-shot) and Winogrande (0-shot). The results are shown below. | Models | HellaSwag | WinoGrade | PIQA | ARC-E | ARC-C | Ave | |:-----------------:|:---------:|:---------:|:-----:|:-----:|:-----:|:-----:| | Mistral-7B | 80.39 | 73.4 | 82.15 | 78.28 | 52.22 | 73.29 | | LLaMA 2-7B | 75.99 | 69.06 | 79.11 | 74.54 | 46.42 | 69.02 | | LLaMA 2-13B | 79.37 | 72.22 | 80.52 | 77.4 | 49.06 | 71.71 | | LLaMA 3.1-8B | 78.92 | 74.19 | 81.12 | 81.06 | 53.67 | 73.79 | | gemma-7b | 80.45 | 73.72 | 80.9 | 79.97 | 54.1 | 73.83 | | Qwen v2-7B | 78.9 | 72.38 | 79.98 | 74.71 | 50.09 | 71.21 | | internlm2.5-7b | 79.14 | 77.9 | 80.52 | 76.16 | 51.37 | 73.02 | | Baichuan2-7B | 72.25 | 67.17 | 77.26 | 72.98 | 42.15 | 66.36 | | Yi-1.5-9B | 77.86 | 73.01 | 80.74 | 79.04 | 55.03 | 73.14 | | deepseek-7b | 76.13 | 69.77 | 79.76 | 71.04 | 44.8 | 68.3 | | Moxin-7B-original | 72.06 | 66.31 | 78.07 | 71.47 | 48.15 | 67.21 | | Moxin-7B-finetune | 80.03 | 75.17 | 82.24 | 81.12 | 58.64 | 75.44 | ## Citation ``` @article{zhao2024fully, title={Fully Open Source Moxin-7B Technical Report}, author={Zhao, Pu and Shen, Xuan and Kong, Zhenglun and Shen, Yixin and Chang, Sung-En and Rupprecht, Timothy and Lu, Lei and Nan, Enfu and Yang, Changdi and He, Yumei and others}, journal={arXiv preprint arXiv:2412.06845}, year={2024} } ```