redmoe-ai-v1 commited on
Commit
f466b2a
·
verified ·
1 Parent(s): 32920f9

Upload folder using huggingface_hub

Browse files
LICENSE ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ MIT License
2
+
3
+ Copyright (c) 2025 rednote-hilab
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
README.md ADDED
@@ -0,0 +1,189 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ license_link: https://huggingface.co/rednote-hilab/dots.llm1.base/blob/main/LICENSE
4
+ library_name: transformers
5
+ language:
6
+ - en
7
+ - zh
8
+ ---
9
+
10
+ # dots1
11
+
12
+ <p align="center">
13
+ <img src="figures/new_logo.png" width="200"/>
14
+ <p>
15
+
16
+ <p align="center">
17
+ &nbsp&nbsp🤗 <a href="https://huggingface.co/rednote-hilab">Hugging Face</a>&nbsp&nbsp | &nbsp&nbsp 📑 <a href="https://github.com/rednote-hilab/dots.llm1/blob/main/dots1_tech_report.pdf">Paper</a> &nbsp&nbsp
18
+ <br>
19
+ 🖥️ <a href="https://huggingface.co/spaces/rednote-hilab/dots-demo">Demo</a>&nbsp&nbsp | &nbsp&nbsp💬 <a href="figures/wechat.png">WeChat (微信)</a>&nbsp&nbsp | &nbsp&nbsp📕 <a href="https://www.xiaohongshu.com/user/profile/683ffe42000000001d021a4c">rednote</a>&nbsp&nbsp
20
+ </p>
21
+
22
+
23
+
24
+
25
+ Visit our Hugging Face (click links above), search checkpoints with names starting with `dots.llm1` or visit the [dots1 collection](https://huggingface.co/collections/rednote-hilab/dotsllm1-68246aaaaba3363374a8aa7c), and you will find all you need! Enjoy!
26
+
27
+
28
+ ## News
29
+
30
+ - 2025.06.06: We released the `dots.llm1` series. Check our [report](https://github.com/rednote-hilab/dots.llm1/blob/main/dots1_tech_report.pdf) for more details!
31
+
32
+
33
+ ## 1. Introduction
34
+
35
+
36
+ The `dots.llm1` model is a large-scale MoE model that activates 14B parameters out of a total of 142B parameters, delivering performance on par with state-of-the-art models.
37
+ Leveraging our meticulously crafted and efficient data processing pipeline, `dots.llm1` achieves performance comparable to Qwen2.5-72B after pretrained on 11.2T high-quality tokens without synthetic data. To foster further research, we open-source intermediate training checkpoints at every one trillion tokens, providing valuable insights into the learning dynamics of large language models.
38
+
39
+
40
+ <p align="center">
41
+ <img width="90%" src="./figures/performance.png">
42
+ </p>
43
+
44
+ ## 2. Model Summary
45
+
46
+ **This repo contains the base and instruction-tuned `dots.llm1` model**. which has the following features:
47
+
48
+ - Type: A MoE model with 14B activated and 142B total parameters trained on 11.2T tokens.
49
+ - Training Stages: Pretraining and SFT.
50
+ - Architecture: Multi-head Attention with QK-Norm in attention Layer, fine-grained MoE utilizing top-6 out of 128 routed experts, plus 2 shared experts.
51
+ - Number of Layers: 62
52
+ - Number of Attention Heads: 32
53
+ - Supported Languages: English, Chinese
54
+ - Context Length: 32,768 tokens
55
+ - License: MIT
56
+
57
+ The highlights from `dots.llm1` include:
58
+
59
+ - **Enhanced Data Processing**: We propose a scalable and fine-grained *three-stage* data processing framework designed to generate large-scale, high-quality and diverse data for pretraining.
60
+ - **No Synthetic Data during Pretraining**: *11.2 trillion* high-quality non-synthetic tokens was used in base model pretraining.
61
+ - **Performance and Cost Efficiency**: `dots.llm1` is an open-source model that activates only *14B* parameters at inference, delivering both comprehensive capabilities and high computational efficiency.
62
+ - **Infrastructure**: We introduce an innovative MoE all-to-all communication and computation overlapping recipe based on interleaved 1F1B pipeline scheduling and an efficient grouped GEMM implementation to boost computational efficiency.
63
+ - **Open Accessibility to Model Dynamics**: Intermediate model checkpoints for *every 1T tokens* trained are released, facilitating future research into the learning dynamics of large language models.
64
+
65
+ ## 3. Example Usage
66
+
67
+ ### Model Downloads
68
+
69
+ <div align="center">
70
+
71
+ | **Model** | **#Total Params** | **#Activated Params** | **Context Length** | **Download Link** |
72
+ | :------------: | :------------: | :------------: | :------------: | :------------: |
73
+ | dots.llm1.base | 142B | 14B | 32K | [🤗 Hugging Face](https://huggingface.co/rednote-hilab/dots.llm1.base) |
74
+ | dots.llm1.inst | 142B | 14B | 32K | [🤗 Hugging Face](https://huggingface.co/rednote-hilab/dots.llm1.inst) |
75
+
76
+ </div>
77
+
78
+ ### Docker (recommended)
79
+
80
+
81
+ The docker images are available on [Docker Hub](https://hub.docker.com/repository/docker/rednotehilab/dots1/tags), based on the official images.
82
+
83
+ You can start a server via vllm.
84
+
85
+ ```shell
86
+ docker run --gpus all \
87
+ -v ~/.cache/huggingface:/root/.cache/huggingface \
88
+ -p 8000:8000 \
89
+ --ipc=host \
90
+ rednotehilab/dots1:vllm-openai-v0.9.0.1 \
91
+ --model rednote-hilab/dots.llm1.inst \
92
+ --tensor-parallel-size 8 \
93
+ --trust-remote-code \
94
+ --served-model-name dots1
95
+ ```
96
+
97
+ Then you can verify whether the model is running successfully in the following way.
98
+
99
+ ```shell
100
+ curl http://localhost:8000/v1/chat/completions \
101
+ -H "Content-Type: application/json" \
102
+ -d '{
103
+ "model": "dots1",
104
+ "messages": [
105
+ {"role": "system", "content": "You are a helpful assistant."},
106
+ {"role": "user", "content": "Who won the world series in 2020?"}
107
+ ],
108
+ "max_tokens": 32,
109
+ "temperature": 0
110
+ }'
111
+ ```
112
+
113
+
114
+ ### Inference with huggingface
115
+
116
+ #### Text Completion
117
+
118
+ ```python
119
+ import torch
120
+ from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig
121
+
122
+ model_name = "rednote-hilab/dots.llm1.base"
123
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
124
+
125
+ model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", torch_dtype=torch.bfloat16, attn_implementation="eager")
126
+ model.generation_config = GenerationConfig.from_pretrained(model_name)
127
+
128
+ text = "An attention function can be described as mapping a query and a set of key-value pairs to an output, where the query, keys, values, and output are all vectors. The output is"
129
+ inputs = tokenizer(text, return_tensors="pt")
130
+ outputs = model.generate(**inputs.to(model.device), max_new_tokens=100)
131
+ result = tokenizer.decode(outputs[0], skip_special_tokens=True)
132
+ print(result)
133
+ ```
134
+
135
+ #### Chat Completion
136
+
137
+ ```python
138
+ import torch
139
+ from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig
140
+
141
+ model_name = "rednote-hilab/dots.llm1.inst"
142
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
143
+
144
+ model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", torch_dtype=torch.bfloat16, attn_implementation="eager")
145
+ model.generation_config = GenerationConfig.from_pretrained(model_name)
146
+
147
+ messages = [
148
+ {"role": "user", "content": "Write a piece of quicksort code in C++"}
149
+ ]
150
+ input_tensor = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt")
151
+ outputs = model.generate(input_tensor.to(model.device), max_new_tokens=200)
152
+
153
+ result = tokenizer.decode(outputs[0][input_tensor.shape[1]:], skip_special_tokens=True)
154
+ print(result)
155
+ ```
156
+
157
+
158
+ ### Inference with sglang
159
+ [SGLang](https://github.com/sgl-project/sglang) is a fast serving framework for large language models and vision language models. SGLang could be used to launch a server with OpenAI-compatible API service. `sglang>=***` is required. It is as easy as
160
+
161
+ ```shell
162
+ python -m sglang.launch_server --model-path dots.llm1.inst --tp 8 --host 0.0.0.0 --port 8000
163
+ ```
164
+ An OpenAI-compatible API will be available at `http://localhost:8000/v1`.
165
+
166
+ ### Inference with vllm
167
+ [vLLM](https://github.com/vllm-project/vllm) is a high-throughput and memory-efficient inference and serving engine for LLMs. `vllm>=***` is recommended.
168
+
169
+ ```shell
170
+ vllm serve dots.llm1.inst --port 8000 --tensor-parallel-size 8
171
+ ```
172
+ An OpenAI-compatible API will be available at `http://localhost:8000/v1`.
173
+
174
+ ## 4. Evaluation Results
175
+
176
+ Detailed evaluation results are reported in this [📑 report](https://github.com/rednote-hilab/dots.llm1/blob/main/dots1_tech_report.pdf).
177
+
178
+ ## Citation
179
+
180
+ If you find `dots.llm1` is useful or want to use in your projects, please kindly cite our paper:
181
+
182
+ ```
183
+ @article{dots1,
184
+ title={dots.llm1 Technical Report},
185
+ author={rednote-hilab},
186
+ journal={arXiv preprint arXiv:TBD},
187
+ year={2025}
188
+ }
189
+ ```
config.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "Dots1ForCausalLM"
4
+ ],
5
+ "attention_bias": false,
6
+ "attention_dropout": 0.0,
7
+ "bos_token_id": null,
8
+ "eos_token_id": 151643,
9
+ "first_k_dense_replace": 1,
10
+ "hidden_act": "silu",
11
+ "hidden_size": 4096,
12
+ "initializer_range": 0.02,
13
+ "intermediate_size": 10944,
14
+ "max_position_embeddings": 32768,
15
+ "model_type": "dots1",
16
+ "moe_intermediate_size": 1408,
17
+ "moe_layer_freq": 1,
18
+ "n_routed_experts": 128,
19
+ "n_shared_experts": 2,
20
+ "norm_topk_prob": true,
21
+ "num_attention_heads": 32,
22
+ "num_experts_per_tok": 6,
23
+ "num_hidden_layers": 62,
24
+ "num_key_value_heads": 32,
25
+ "pretraining_tp": 1,
26
+ "rms_norm_eps": 1e-05,
27
+ "rope_scaling": null,
28
+ "rope_theta": 10000000,
29
+ "routed_scaling_factor": 2.5,
30
+ "sliding_window": null,
31
+ "tie_word_embeddings": false,
32
+ "torch_dtype": "bfloat16",
33
+ "transformers_version": "4.46.3",
34
+ "use_cache": true,
35
+ "use_sliding_window": false,
36
+ "vocab_size": 152064
37
+ }
figures/XHSlong750px.png ADDED
figures/new_logo.png ADDED

Git LFS Details

  • SHA256: 2e5808698bcd60df90869af469743248a4560d0ffb2232eceb74cd9c0a7df763
  • Pointer size: 131 Bytes
  • Size of remote file: 101 kB
figures/performance.png ADDED

Git LFS Details

  • SHA256: ca42a057f65c1ea12c303e41938dbe38fc285769002272af767b76605cf8ea98
  • Pointer size: 131 Bytes
  • Size of remote file: 139 kB
figures/wechat.png ADDED

Git LFS Details

  • SHA256: e6f386b64bd313bd998bf0f25e9f1b32c0fbbfe7d972a60227c22fdc044da885
  • Pointer size: 131 Bytes
  • Size of remote file: 118 kB
generation_config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": null,
4
+ "eos_token_id": 151643,
5
+ "transformers_version": "4.46.3"
6
+ }
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
model.safetensors.index.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,144 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "added_tokens_decoder": {
4
+ "151643": {
5
+ "content": "<|endoftext|>",
6
+ "lstrip": false,
7
+ "normalized": false,
8
+ "rstrip": false,
9
+ "single_word": false,
10
+ "special": true
11
+ },
12
+ "151644": {
13
+ "content": "<|im_start|>",
14
+ "lstrip": false,
15
+ "normalized": false,
16
+ "rstrip": false,
17
+ "single_word": false,
18
+ "special": true
19
+ },
20
+ "151645": {
21
+ "content": "<|im_end|>",
22
+ "lstrip": false,
23
+ "normalized": false,
24
+ "rstrip": false,
25
+ "single_word": false,
26
+ "special": true
27
+ },
28
+ "151646": {
29
+ "content": "<|userprompt|>",
30
+ "lstrip": false,
31
+ "normalized": false,
32
+ "rstrip": false,
33
+ "single_word": false,
34
+ "special": true
35
+ },
36
+ "151647": {
37
+ "content": "<|endofuserprompt|>",
38
+ "lstrip": false,
39
+ "normalized": false,
40
+ "rstrip": false,
41
+ "single_word": false,
42
+ "special": true
43
+ },
44
+ "151648": {
45
+ "content": "<|response|>",
46
+ "lstrip": false,
47
+ "normalized": false,
48
+ "rstrip": false,
49
+ "single_word": false,
50
+ "special": true
51
+ },
52
+ "151649": {
53
+ "content": "<|endofresponse|>",
54
+ "lstrip": false,
55
+ "normalized": false,
56
+ "rstrip": false,
57
+ "single_word": false,
58
+ "special": true
59
+ },
60
+ "151650": {
61
+ "content": "<|system|>",
62
+ "lstrip": false,
63
+ "normalized": false,
64
+ "rstrip": false,
65
+ "single_word": false,
66
+ "special": true
67
+ },
68
+ "151651": {
69
+ "content": "<|endofsystem|>",
70
+ "lstrip": false,
71
+ "normalized": false,
72
+ "rstrip": false,
73
+ "single_word": false,
74
+ "special": true
75
+ },
76
+ "151652": {
77
+ "content": "<|observation|>",
78
+ "lstrip": false,
79
+ "normalized": false,
80
+ "rstrip": false,
81
+ "single_word": false,
82
+ "special": true
83
+ },
84
+ "151653": {
85
+ "content": "<|endofobservation|>",
86
+ "lstrip": false,
87
+ "normalized": false,
88
+ "rstrip": false,
89
+ "single_word": false,
90
+ "special": true
91
+ },
92
+ "151654": {
93
+ "content": "<|execution|>",
94
+ "lstrip": false,
95
+ "normalized": false,
96
+ "rstrip": false,
97
+ "single_word": false,
98
+ "special": true
99
+ },
100
+ "151655": {
101
+ "content": "<|endofexecution|>",
102
+ "lstrip": false,
103
+ "normalized": false,
104
+ "rstrip": false,
105
+ "single_word": false,
106
+ "special": true
107
+ },
108
+ "151656": {
109
+ "content": "<|reject-unknown|>",
110
+ "lstrip": false,
111
+ "normalized": false,
112
+ "rstrip": false,
113
+ "single_word": false,
114
+ "special": true
115
+ },
116
+ "151657": {
117
+ "content": "<|sec-cot|>",
118
+ "lstrip": false,
119
+ "normalized": false,
120
+ "rstrip": false,
121
+ "single_word": false,
122
+ "special": true
123
+ },
124
+ "151658": {
125
+ "content": "<|sec-end-cot|>",
126
+ "lstrip": false,
127
+ "normalized": false,
128
+ "rstrip": false,
129
+ "single_word": false,
130
+ "special": true
131
+ }
132
+ },
133
+ "additional_special_tokens": ["<|im_start|>", "<|im_end|>", "<|userprompt|>", "<|endofuserprompt|>", "<|response|>", "<|endofresponse|>", "<|system|>", "<|endofsystem|>", "<|observation|>", "<|endofobservation|>", "<|execution|>", "<|endofexecution|>", "<|reject-unknown|>", "<|sec-cot|>", "<|sec-end-cot|>"],
134
+ "bos_token": null,
135
+ "chat_template": "{% if messages[0]['role'] == 'system' %}<|system|>{{ messages[0]['content'] }}<|endofsystem|>{% set start_idx = 1 %}{% else %}<|system|><|endofsystem|>{% set start_idx = 0 %}{% endif %}{% for idx in range(start_idx, messages|length) %}{% if messages[idx]['role'] == 'user' %}<|userprompt|>{{ messages[idx]['content'] }}<|endofuserprompt|>{% elif messages[idx]['role'] == 'assistant' %}<|response|>{{ messages[idx]['content'] }}<|endofresponse|>{% endif %}{% endfor %}{% if add_generation_prompt and messages[-1]['role'] == 'user' %}<|response|>{% endif %}",
136
+ "clean_up_tokenization_spaces": false,
137
+ "eos_token": "<|endoftext|>",
138
+ "errors": "replace",
139
+ "model_max_length": 32768,
140
+ "pad_token": "<|endoftext|>",
141
+ "split_special_tokens": false,
142
+ "tokenizer_class": "Qwen2Tokenizer",
143
+ "unk_token": null
144
+ }
vocab.json ADDED
The diff for this file is too large to render. See raw diff