Intel/Qwen3-235B-A22B-Thinking-2507-int4-mixed-ar · --enable-reasoning has been removed in vllm 0.10

11 days ago

hello!
Thank you for your brilliant work! Now we can serve your super quantilized model with an openai API now.
But, --enable-reasoning has been removed in vllm 0.10, when we start with this, vllm reports that this is not supported.
After remove that, when using open webui, there is no more thinking block because your default template contains....
So can you please remove the default to avoid the forced thinking?

wenhuach

Intel org 11 days ago

Thanks for reporting this issue. I’ve removed the --enable-reasoning .
However, I’m not sure how to modify the default template. It would be helpful if you could either submit a pull request or provide more details, as I simply copied the files from the original models.

pty819

11 days ago

•

edited 11 days ago

Thanks for reporting this issue. I’ve removed the --enable-reasoning .
However, I’m not sure how to modify the default template. It would be helpful if you could either submit a pull request or provide more details, as I simply copied the files from the original models.

I just edited the last 3 lines on the chat_template.jinja and the thinking block has returned:

{%- if add_generation_prompt %}
{{- '<|im_start|>assistant\n' }}
{%- endif %}

I'm not sure if that's necessary to change that... maybe vllm have something other to split the output into thinking contents. Let me have a search :)