Adding missing <tool> token to generation_config.json for vLLM
The model's generation_config.json is missing <|call|> (token 200012) in the eos_token_id list. This causes the model to not stop generation after emitting a tool call, which makes the Harmony parser fail with:
openai_harmony.HarmonyError: Unexpected token 12606 while expecting start token 200006
vLLM's chat completion handler for GPT-OSS models tries to inject the Harmony stop tokens (200002, 200012) via default_sampling_params, but to_sampling_params() in the request protocol takes stop_token_ids directly from the request body (which defaults to []) and never merges the defaults. As a result, the only reliable way to ensure <|call|> stops generation is to include it in eos_token_id in the model's generation_config.json, since the engine always applies those.
The fix adds 200012 (<|call|>) to the eos_token_id array:
"eos_token_id": [
200002,
200012,
199999
]
Without the fix, any request with tools fails with a 500 error. With the fix, tool calling works correctly in both streaming and non-streaming modes.
Tested on vLLM v0.18.0 with both BF16 and MXFP4 checkpoints.