Spaces:

Luigi
/

tiny-scribe

Running

Luigi commited on Feb 5

Commit

27363be

1 Parent(s): d207764

fix: prevent models from copying schema descriptions as extracted content

Models (Gemma-3, LFM2-Extract) were treating descriptive values in schema
examples as actual extracted content to copy.

Changed prompts to use empty arrays in schema with separate descriptions:
- Before: "action_items": ["Specific action items with owner and deadline"]
- After: "action_items": [] + separate description line

This forces models to generate actual extracted content instead of copying
schema descriptions.

Applies to both:
- _build_schema_extraction_prompt() (LFM2-Extract optimized)
- _build_reasoning_extraction_prompt() (Qwen3 hybrid optimized)

For both English and Traditional Chinese versions.

Files changed (1) hide show

meeting_summarizer/extraction.py +36 -16

meeting_summarizer/extraction.py CHANGED Viewed

@@ -330,23 +330,33 @@ def _build_schema_extraction_prompt(output_language: str) -> str:
         return """以 JSON 格式返回資料，使用以下架構：
 {
-  "action_items": ["包含負責人和截止日期的具體行動項目"],
-  "decisions": ["包含理由的決策"],
-  "key_points": ["重要討論要點"],
-  "open_questions": ["未解決的問題或疑慮"]
 }
 從使用者提供的逐字稿中提取。逐字稿可能包含重複、雜訊或不完整內容，請專注於有意義的對話內容，忽略重複的詞句。"""
     else:
         return """Return data as a JSON object with the following schema:
 {
-  "action_items": ["Specific action items with owner and deadline"],
-  "decisions": ["Decisions made with rationale"],
-  "key_points": ["Important discussion points"],
-  "open_questions": ["Unresolved questions or concerns"]
 }
 Extract from the transcript provided by the user. The transcript may contain repetitions, noise, or incomplete sentences - focus on meaningful dialogue content and ignore repetitive phrases."""
@@ -366,12 +376,17 @@ def _build_reasoning_extraction_prompt(output_language: str) -> str:
 推理後，以 JSON 格式返回資料，使用以下架構：
 {
-  "action_items": ["包含負責人和截止日期的具體行動項目"],
-  "decisions": ["包含理由的決策"],
-  "key_points": ["重要討論要點"],
-  "open_questions": ["未解決的問題或疑慮"]
 }
 規則：
 - 每個項目必須是完整、獨立的句子
 - 在每個項目中包含上下文（誰、什麼、何時）
@@ -391,12 +406,17 @@ The transcript may contain repetitions, noise, or incomplete sentences - focus o
 After reasoning, return data as a JSON object with the following schema:
 {
-  "action_items": ["Specific action items with owner and deadline"],
-  "decisions": ["Decisions made with rationale"],
-  "key_points": ["Important discussion points"],
-  "open_questions": ["Unresolved questions or concerns"]
 }
 Rules:
 - Each item must be a complete, standalone sentence
 - Include context (who, what, when) in each item

         return """以 JSON 格式返回資料，使用以下架構：
 {
+  "action_items": [],
+  "decisions": [],
+  "key_points": [],
+  "open_questions": []
 }
+action_items: 包含負責人和截止日期的具體行動項目
+decisions: 包合理由的決策
+key_points: 重要討論要點
+open_questions: 未解決的問題或疑慮
 從使用者提供的逐字稿中提取。逐字稿可能包含重複、雜訊或不完整內容，請專注於有意義的對話內容，忽略重複的詞句。"""
     else:
         return """Return data as a JSON object with the following schema:
 {
+  "action_items": [],
+  "decisions": [],
+  "key_points": [],
+  "open_questions": []
 }
+action_items: Specific action items with owner and deadline
+decisions: Decisions made with rationale
+key_points: Important discussion points
+open_questions: Unresolved questions or concerns
 Extract from the transcript provided by the user. The transcript may contain repetitions, noise, or incomplete sentences - focus on meaningful dialogue content and ignore repetitive phrases."""
 推理後，以 JSON 格式返回資料，使用以下架構：
 {
+  "action_items": [],
+  "decisions": [],
+  "key_points": [],
+  "open_questions": []
 }
+action_items: 包含負責人和截止日期的具體行動項目
+decisions: 包合理由的決策
+key_points: 重要討論要點
+open_questions: 未解決的問題或疑慮
 規則：
 - 每個項目必須是完整、獨立的句子
 - 在每個項目中包含上下文（誰、什麼、何時）
 After reasoning, return data as a JSON object with the following schema:
 {
+  "action_items": [],
+  "decisions": [],
+  "key_points": [],
+  "open_questions": []
 }
+action_items: Specific action items with owner and deadline
+decisions: Decisions made with rationale
+key_points: Important discussion points
+open_questions: Unresolved questions or concerns
 Rules:
 - Each item must be a complete, standalone sentence
 - Include context (who, what, when) in each item