Adding missing <tool> token to generation_config.json for vLLM

by evilfreelancer - opened 3 days ago

base: refs/heads/main

←

from: refs/pr/3

Discussion Files changed

-0

evilfreelancer

3 days ago

The model's generation_config.json is missing <|call|> (token 200012) in the eos_token_id list. This causes the model to not stop generation after emitting a tool call, which makes the Harmony parser fail with:

openai_harmony.HarmonyError: Unexpected token 12606 while expecting start token 200006

vLLM's chat completion handler for GPT-OSS models tries to inject the Harmony stop tokens (200002, 200012) via default_sampling_params, but to_sampling_params() in the request protocol takes stop_token_ids directly from the request body (which defaults to []) and never merges the defaults. As a result, the only reliable way to ensure <|call|> stops generation is to include it in eos_token_id in the model's generation_config.json, since the engine always applies those.

The fix adds 200012 (<|call|>) to the eos_token_id array:

"eos_token_id": [
  200002,
  200012,
  199999
]

Without the fix, any request with tools fails with a 500 error. With the fix, tool calling works correctly in both streaming and non-streaming modes.

Tested on vLLM v0.18.0 with both BF16 and MXFP4 checkpoints.

Update generation_config.jsone4a54df7

evilfreelancer changed pull request title from Update generation_config.json to Adding missing <tool> token to generation_config.json for vLLM 3 days ago

hammadtime changed pull request status to merged 1 day ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment