redmoe-ai-v1 commited on
Commit
1f4143c
·
verified ·
1 Parent(s): f466b2a

Upload folder using huggingface_hub

Browse files
Files changed (3) hide show
  1. README.md +17 -14
  2. config.json +1 -0
  3. figures/new_logo2.png +0 -0
README.md CHANGED
@@ -10,7 +10,7 @@ language:
10
  # dots1
11
 
12
  <p align="center">
13
- <img src="figures/new_logo.png" width="200"/>
14
  <p>
15
 
16
  <p align="center">
@@ -20,8 +20,6 @@ language:
20
  </p>
21
 
22
 
23
-
24
-
25
  Visit our Hugging Face (click links above), search checkpoints with names starting with `dots.llm1` or visit the [dots1 collection](https://huggingface.co/collections/rednote-hilab/dotsllm1-68246aaaaba3363374a8aa7c), and you will find all you need! Enjoy!
26
 
27
 
@@ -113,6 +111,8 @@ curl http://localhost:8000/v1/chat/completions \
113
 
114
  ### Inference with huggingface
115
 
 
 
116
  #### Text Completion
117
 
118
  ```python
@@ -122,8 +122,7 @@ from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig
122
  model_name = "rednote-hilab/dots.llm1.base"
123
  tokenizer = AutoTokenizer.from_pretrained(model_name)
124
 
125
- model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", torch_dtype=torch.bfloat16, attn_implementation="eager")
126
- model.generation_config = GenerationConfig.from_pretrained(model_name)
127
 
128
  text = "An attention function can be described as mapping a query and a set of key-value pairs to an output, where the query, keys, values, and output are all vectors. The output is"
129
  inputs = tokenizer(text, return_tensors="pt")
@@ -141,8 +140,7 @@ from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig
141
  model_name = "rednote-hilab/dots.llm1.inst"
142
  tokenizer = AutoTokenizer.from_pretrained(model_name)
143
 
144
- model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", torch_dtype=torch.bfloat16, attn_implementation="eager")
145
- model.generation_config = GenerationConfig.from_pretrained(model_name)
146
 
147
  messages = [
148
  {"role": "user", "content": "Write a piece of quicksort code in C++"}
@@ -154,21 +152,26 @@ result = tokenizer.decode(outputs[0][input_tensor.shape[1]:], skip_special_token
154
  print(result)
155
  ```
156
 
 
157
 
158
- ### Inference with sglang
159
- [SGLang](https://github.com/sgl-project/sglang) is a fast serving framework for large language models and vision language models. SGLang could be used to launch a server with OpenAI-compatible API service. `sglang>=***` is required. It is as easy as
160
 
161
  ```shell
162
- python -m sglang.launch_server --model-path dots.llm1.inst --tp 8 --host 0.0.0.0 --port 8000
163
  ```
 
164
  An OpenAI-compatible API will be available at `http://localhost:8000/v1`.
165
 
166
- ### Inference with vllm
167
- [vLLM](https://github.com/vllm-project/vllm) is a high-throughput and memory-efficient inference and serving engine for LLMs. `vllm>=***` is recommended.
 
 
 
168
 
169
  ```shell
170
- vllm serve dots.llm1.inst --port 8000 --tensor-parallel-size 8
171
  ```
 
172
  An OpenAI-compatible API will be available at `http://localhost:8000/v1`.
173
 
174
  ## 4. Evaluation Results
@@ -186,4 +189,4 @@ If you find `dots.llm1` is useful or want to use in your projects, please kindly
186
  journal={arXiv preprint arXiv:TBD},
187
  year={2025}
188
  }
189
- ```
 
10
  # dots1
11
 
12
  <p align="center">
13
+ <img src="figures/new_logo2.png" width="300"/>
14
  <p>
15
 
16
  <p align="center">
 
20
  </p>
21
 
22
 
 
 
23
  Visit our Hugging Face (click links above), search checkpoints with names starting with `dots.llm1` or visit the [dots1 collection](https://huggingface.co/collections/rednote-hilab/dotsllm1-68246aaaaba3363374a8aa7c), and you will find all you need! Enjoy!
24
 
25
 
 
111
 
112
  ### Inference with huggingface
113
 
114
+ We are working to merge it into Transformers ([PR #38143](https://github.com/huggingface/transformers/pull/38143)).
115
+
116
  #### Text Completion
117
 
118
  ```python
 
122
  model_name = "rednote-hilab/dots.llm1.base"
123
  tokenizer = AutoTokenizer.from_pretrained(model_name)
124
 
125
+ model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", torch_dtype=torch.bfloat16)
 
126
 
127
  text = "An attention function can be described as mapping a query and a set of key-value pairs to an output, where the query, keys, values, and output are all vectors. The output is"
128
  inputs = tokenizer(text, return_tensors="pt")
 
140
  model_name = "rednote-hilab/dots.llm1.inst"
141
  tokenizer = AutoTokenizer.from_pretrained(model_name)
142
 
143
+ model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", torch_dtype=torch.bfloat16)
 
144
 
145
  messages = [
146
  {"role": "user", "content": "Write a piece of quicksort code in C++"}
 
152
  print(result)
153
  ```
154
 
155
+ ### Inference with vllm
156
 
157
+ [vLLM](https://github.com/vllm-project/vllm) is a high-throughput and memory-efficient inference and serving engine for LLMs. Official support for this feature is covered in [PR #18254](https://github.com/vllm-project/vllm/pull/18254).
 
158
 
159
  ```shell
160
+ vllm serve dots.llm1.inst --port 8000 --tensor-parallel-size 8
161
  ```
162
+
163
  An OpenAI-compatible API will be available at `http://localhost:8000/v1`.
164
 
165
+ ### Inference with sglang
166
+
167
+ [SGLang](https://github.com/sgl-project/sglang) is a fast serving framework for large language models and vision language models. SGLang could be used to launch a server with OpenAI-compatible API service. Official support for this feature is covered in [PR #6471](https://github.com/sgl-project/sglang/pull/6471).
168
+
169
+ Getting started is as simple as running:
170
 
171
  ```shell
172
+ python -m sglang.launch_server --model-path dots.llm1.inst --tp 8 --host 0.0.0.0 --port 8000
173
  ```
174
+
175
  An OpenAI-compatible API will be available at `http://localhost:8000/v1`.
176
 
177
  ## 4. Evaluation Results
 
189
  journal={arXiv preprint arXiv:TBD},
190
  year={2025}
191
  }
192
+ ```
config.json CHANGED
@@ -28,6 +28,7 @@
28
  "rope_theta": 10000000,
29
  "routed_scaling_factor": 2.5,
30
  "sliding_window": null,
 
31
  "tie_word_embeddings": false,
32
  "torch_dtype": "bfloat16",
33
  "transformers_version": "4.46.3",
 
28
  "rope_theta": 10000000,
29
  "routed_scaling_factor": 2.5,
30
  "sliding_window": null,
31
+ "scoring_func": "noaux_tc",
32
  "tie_word_embeddings": false,
33
  "torch_dtype": "bfloat16",
34
  "transformers_version": "4.46.3",
figures/new_logo2.png ADDED