`olmo3` reasoning parser crashes at startup on Domyn-Small-v1.0 tokenizer

#1
by alescire94 - opened

Hi! running into a startup crash when executing the model card's command using the olmo3 reasoning parser. Details below.

vLLM version: 0.21.0
Command:

uv run vllm serve domyn/Domyn-Small-v1.0 \
    --tensor-parallel-size 1 \
    --dtype bfloat16 \
    --max-model-len 32768 \
    --max-num-seqs 256 \
    --reasoning-parser olmo3

Error:

File ".../vllm/reasoning/olmo3_reasoning_parser.py", line 242, in __init__
    self.vocab[token] for token in self.think_end_first_split
KeyError: 'Ġ</'

Repro:

  1. uv add vllm==0.21.0
  2. Run the command above.
  3. Server crashes at startup with the traceback shown.

Hi, thanks for the report, we were able to reproduce it on our side.

It's a parser/tokenizer mismatch in vLLM 0.21's Olmo3ReasoningParser: its init does an eager lookup of GPT-2-BPE token strings ('Ġ</' etc.) in the vocab, but Domyn-Small uses a SentencePiece tokenizer where / aren't single vocab tokens — so it dies at startup before serving any request.

vLLM 0.20 didn't have this eager check, which is why it worked there.

Two quick options while we sort it out:

Pin to vLLM 0.20.0 — known good, no other change needed.
Or wait a couple of days — we'll ship a small reasoning-parser plugin (loadable via --reasoning-parser-plugin) along with usage instructions in the model card.

Will follow up here once it's published.

Hi @alescire94 , we've just pushed the custom reasoning parser plugin.
You can find instruction on how to use it in the README.

Thank you again for flagging the issue.

iGenius-AI-Team changed discussion status to closed

Sign up or log in to comment