Spaces:

david167
/

question-generation-api

Sleeping

david167 commited on Aug 6, 2025

Commit

0d85e38

1 Parent(s): 625d819

Fix response clipping: use robust assistant header detection instead of prompt length

Files changed (1) hide show

gradio_app.py CHANGED Viewed

@@ -121,7 +121,19 @@ def chat_with_model(message, history, temperature):
             # Decode the generated text and remove the input prompt
             full_text = model_manager.tokenizer.decode(outputs[0], skip_special_tokens=True)
-            response = full_text[len(prompt):].strip()
             if not response:
                 response = "I couldn't generate a response. Please try a different prompt."

             # Decode the generated text and remove the input prompt
             full_text = model_manager.tokenizer.decode(outputs[0], skip_special_tokens=True)
+            # Use a more robust method to extract the response
+            # Look for the assistant header end and extract everything after it
+            assistant_start = "<|start_header_id|>assistant<|end_header_id|>"
+            if assistant_start in full_text:
+                # Find the position after the assistant header
+                response_start = full_text.find(assistant_start) + len(assistant_start)
+                response = full_text[response_start:].strip()
+            else:
+                # Fallback: try to remove the original prompt
+                try:
+                    response = full_text[len(prompt):].strip()
+                except:
+                    response = full_text.strip()
             if not response:
                 response = "I couldn't generate a response. Please try a different prompt."