Qwen
/

Qwen3-Next-80B-A3B-Thinking

Text Generation

Model card Files Files and versions

jklj077 commited on Sep 12

Commit

61c2c4f

·

verified ·

1 Parent(s): 9284628

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -219,12 +219,12 @@ pip install vllm --pre --extra-index-url https://wheels.vllm.ai/nightly
 The following command can be used to create an API endpoint at `http://localhost:8000/v1` with maximum context length 256K tokens using tensor parallel on 4 GPUs.
 ```shell
-VLLM_ALLOW_LONG_MAX_MODEL_LEN=1 vllm serve Qwen/Qwen3-Next-80B-A3B-Thinking --port 8000 --tensor-parallel-size 4 --max-model-len 262144 --enable-reasoning --reasoning-parser deepseek_r1
 ```
 The following command is recommended for MTP with the rest settings the same as above:
 ```shell
-VLLM_ALLOW_LONG_MAX_MODEL_LEN=1 vllm serve Qwen/Qwen3-Next-80B-A3B-Thinking --port 8000 --tensor-parallel-size 4 --max-model-len 262144 --enable-reasoning --reasoning-parser deepseek_r1 --speculative-config '{"method":"qwen3_next_mtp","num_speculative_tokens":2}'
 ```
 > [!Note]

 The following command can be used to create an API endpoint at `http://localhost:8000/v1` with maximum context length 256K tokens using tensor parallel on 4 GPUs.
 ```shell
+VLLM_ALLOW_LONG_MAX_MODEL_LEN=1 vllm serve Qwen/Qwen3-Next-80B-A3B-Thinking --port 8000 --tensor-parallel-size 4 --max-model-len 262144 --reasoning-parser deepseek_r1
 ```
 The following command is recommended for MTP with the rest settings the same as above:
 ```shell
+VLLM_ALLOW_LONG_MAX_MODEL_LEN=1 vllm serve Qwen/Qwen3-Next-80B-A3B-Thinking --port 8000 --tensor-parallel-size 4 --max-model-len 262144 --reasoning-parser deepseek_r1 --speculative-config '{"method":"qwen3_next_mtp","num_speculative_tokens":2}'
 ```
 > [!Note]