ImageStudio

Runtime error

ImageStudio Maintainer Claude Opus 4.8 (1M context) commited on 8 days ago

Commit

c4415be

1 Parent(s): a597aa8

fix: VLM assistant never stops (no generation_config) -> pin eos/pad tokens

The finetune ships no generation_config.json and its config has no
eos_token_id, so generate() had no stop token and ran to max_new_tokens
(endless output) even after the repetition fix. Pin eos_token_id to the
chat-template terminator <|im_end|> (+ <|endoftext|> fallback) and set
pad_token_id explicitly, derived from the tokenizer at load time.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Files changed (1) hide show

app.py +21 -1

app.py CHANGED Viewed

@@ -159,7 +159,23 @@ vlm_model = AutoModelForImageTextToText.from_pretrained(
 )
 vlm_model.to("cuda").eval()
-print("Assistant loaded!")
 # =============================================================================
@@ -679,6 +695,10 @@ def _vlm_chat_core(message, image, reasoning, max_new_tokens):
                     # keeping decoding deterministic (important for prompt rewrites).
                     repetition_penalty=1.3,
                     no_repeat_ngram_size=3,
                     streamer=streamer,
                 )
         except Exception as exc:  # noqa: BLE001 - surfaced to the main thread

 )
 vlm_model.to("cuda").eval()
+# This finetune ships NO generation_config.json and its config carries no
+# eos_token_id, so generate() has no stop token and runs to max_new_tokens
+# (endless output). Pin the stop tokens explicitly from the tokenizer: the chat
+# template ends each assistant turn with <|im_end|>, so that's the real
+# terminator (plus <|endoftext|> as a fallback).
+_vlm_tokenizer = getattr(vlm_processor, "tokenizer", vlm_processor)
+_VLM_EOS_IDS = sorted({
+    tid for tok in ("<|im_end|>", "<|endoftext|>")
+    for tid in (_vlm_tokenizer.convert_tokens_to_ids(tok),)
+    if isinstance(tid, int) and tid >= 0
+} | ({_vlm_tokenizer.eos_token_id} if _vlm_tokenizer.eos_token_id is not None else set()))
+_VLM_PAD_ID = (
+    _vlm_tokenizer.pad_token_id
+    if _vlm_tokenizer.pad_token_id is not None
+    else (_VLM_EOS_IDS[0] if _VLM_EOS_IDS else None)
+)
+print(f"Assistant loaded! (eos_ids={_VLM_EOS_IDS}, pad_id={_VLM_PAD_ID})")
 # =============================================================================
                     # keeping decoding deterministic (important for prompt rewrites).
                     repetition_penalty=1.3,
                     no_repeat_ngram_size=3,
+                    # Explicit stop tokens — the model has no generation_config, so
+                    # without these generate() never stops and rambles to the budget.
+                    eos_token_id=_VLM_EOS_IDS,
+                    pad_token_id=_VLM_PAD_ID,
                     streamer=streamer,
                 )
         except Exception as exc:  # noqa: BLE001 - surfaced to the main thread