Spaces:
Running
fix(article): EOS-trim the prefix in the tool-response delta
The §5 compute_delta slices the bridge at the template-closed prefix, which on ChatML/Qwen templates ends in <|im_end|>\n. But a real sampler stops at <|im_end|> — the trailing \n is never sampled. So the delta begins at <|im_start|> and appending it to the sampled buffer yields <|im_end|><|im_start|>, dropping the turn separator (off by one token at every tool-turn boundary).
Verified against apply_chat_template on the post's own Qwen2.5 example: buffer + delta != full render before the fix; equal after.
Fix: trim the prefix back to the last <|im_end|> before subtracting, so the separator lands in delta (loss-masked scaffolding) instead of vanishing between the two renders. Llama 3's <|eot_id|> emits nothing after the stop token, so the trim is a no-op there — which is why the bug was easy to miss. Matches what TRL's _get_tool_suffix_ids already does.
🤖 Generated with Claude Code