Update README.md
Browse files
README.md
CHANGED
|
@@ -219,12 +219,12 @@ pip install vllm --pre --extra-index-url https://wheels.vllm.ai/nightly
|
|
| 219 |
|
| 220 |
The following command can be used to create an API endpoint at `http://localhost:8000/v1` with maximum context length 256K tokens using tensor parallel on 4 GPUs.
|
| 221 |
```shell
|
| 222 |
-
VLLM_ALLOW_LONG_MAX_MODEL_LEN=1 vllm serve Qwen/Qwen3-Next-80B-A3B-Thinking --port 8000 --tensor-parallel-size 4 --max-model-len 262144 --
|
| 223 |
```
|
| 224 |
|
| 225 |
The following command is recommended for MTP with the rest settings the same as above:
|
| 226 |
```shell
|
| 227 |
-
VLLM_ALLOW_LONG_MAX_MODEL_LEN=1 vllm serve Qwen/Qwen3-Next-80B-A3B-Thinking --port 8000 --tensor-parallel-size 4 --max-model-len 262144 --
|
| 228 |
```
|
| 229 |
|
| 230 |
> [!Note]
|
|
|
|
| 219 |
|
| 220 |
The following command can be used to create an API endpoint at `http://localhost:8000/v1` with maximum context length 256K tokens using tensor parallel on 4 GPUs.
|
| 221 |
```shell
|
| 222 |
+
VLLM_ALLOW_LONG_MAX_MODEL_LEN=1 vllm serve Qwen/Qwen3-Next-80B-A3B-Thinking --port 8000 --tensor-parallel-size 4 --max-model-len 262144 --reasoning-parser deepseek_r1
|
| 223 |
```
|
| 224 |
|
| 225 |
The following command is recommended for MTP with the rest settings the same as above:
|
| 226 |
```shell
|
| 227 |
+
VLLM_ALLOW_LONG_MAX_MODEL_LEN=1 vllm serve Qwen/Qwen3-Next-80B-A3B-Thinking --port 8000 --tensor-parallel-size 4 --max-model-len 262144 --reasoning-parser deepseek_r1 --speculative-config '{"method":"qwen3_next_mtp","num_speculative_tokens":2}'
|
| 228 |
```
|
| 229 |
|
| 230 |
> [!Note]
|