Qwen
/

Text Generation
Transformers
Safetensors
qwen3_next
conversational
jklj077 commited on
Commit
61c2c4f
·
verified ·
1 Parent(s): 9284628

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -219,12 +219,12 @@ pip install vllm --pre --extra-index-url https://wheels.vllm.ai/nightly
219
 
220
  The following command can be used to create an API endpoint at `http://localhost:8000/v1` with maximum context length 256K tokens using tensor parallel on 4 GPUs.
221
  ```shell
222
- VLLM_ALLOW_LONG_MAX_MODEL_LEN=1 vllm serve Qwen/Qwen3-Next-80B-A3B-Thinking --port 8000 --tensor-parallel-size 4 --max-model-len 262144 --enable-reasoning --reasoning-parser deepseek_r1
223
  ```
224
 
225
  The following command is recommended for MTP with the rest settings the same as above:
226
  ```shell
227
- VLLM_ALLOW_LONG_MAX_MODEL_LEN=1 vllm serve Qwen/Qwen3-Next-80B-A3B-Thinking --port 8000 --tensor-parallel-size 4 --max-model-len 262144 --enable-reasoning --reasoning-parser deepseek_r1 --speculative-config '{"method":"qwen3_next_mtp","num_speculative_tokens":2}'
228
  ```
229
 
230
  > [!Note]
 
219
 
220
  The following command can be used to create an API endpoint at `http://localhost:8000/v1` with maximum context length 256K tokens using tensor parallel on 4 GPUs.
221
  ```shell
222
+ VLLM_ALLOW_LONG_MAX_MODEL_LEN=1 vllm serve Qwen/Qwen3-Next-80B-A3B-Thinking --port 8000 --tensor-parallel-size 4 --max-model-len 262144 --reasoning-parser deepseek_r1
223
  ```
224
 
225
  The following command is recommended for MTP with the rest settings the same as above:
226
  ```shell
227
+ VLLM_ALLOW_LONG_MAX_MODEL_LEN=1 vllm serve Qwen/Qwen3-Next-80B-A3B-Thinking --port 8000 --tensor-parallel-size 4 --max-model-len 262144 --reasoning-parser deepseek_r1 --speculative-config '{"method":"qwen3_next_mtp","num_speculative_tokens":2}'
228
  ```
229
 
230
  > [!Note]