Spaces:
Running
Running
fix: prevent models from copying schema descriptions as extracted content
Browse filesModels (Gemma-3, LFM2-Extract) were treating descriptive values in schema
examples as actual extracted content to copy.
Changed prompts to use empty arrays in schema with separate descriptions:
- Before: "action_items": ["Specific action items with owner and deadline"]
- After: "action_items": [] + separate description line
This forces models to generate actual extracted content instead of copying
schema descriptions.
Applies to both:
- _build_schema_extraction_prompt() (LFM2-Extract optimized)
- _build_reasoning_extraction_prompt() (Qwen3 hybrid optimized)
For both English and Traditional Chinese versions.
- meeting_summarizer/extraction.py +36 -16
meeting_summarizer/extraction.py
CHANGED
|
@@ -330,23 +330,33 @@ def _build_schema_extraction_prompt(output_language: str) -> str:
|
|
| 330 |
return """以 JSON 格式返回資料,使用以下架構:
|
| 331 |
|
| 332 |
{
|
| 333 |
-
"action_items": [
|
| 334 |
-
"decisions": [
|
| 335 |
-
"key_points": [
|
| 336 |
-
"open_questions": [
|
| 337 |
}
|
| 338 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 339 |
從使用者提供的逐字稿中提取。逐字稿可能包含重複、雜訊或不完整內容,請專注於有意義的對話內容,忽略重複的詞句。"""
|
| 340 |
else:
|
| 341 |
return """Return data as a JSON object with the following schema:
|
| 342 |
|
| 343 |
{
|
| 344 |
-
"action_items": [
|
| 345 |
-
"decisions": [
|
| 346 |
-
"key_points": [
|
| 347 |
-
"open_questions": [
|
| 348 |
}
|
| 349 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 350 |
Extract from the transcript provided by the user. The transcript may contain repetitions, noise, or incomplete sentences - focus on meaningful dialogue content and ignore repetitive phrases."""
|
| 351 |
|
| 352 |
|
|
@@ -366,12 +376,17 @@ def _build_reasoning_extraction_prompt(output_language: str) -> str:
|
|
| 366 |
|
| 367 |
推理後,以 JSON 格式返回資料,使用以下架構:
|
| 368 |
{
|
| 369 |
-
"action_items": [
|
| 370 |
-
"decisions": [
|
| 371 |
-
"key_points": [
|
| 372 |
-
"open_questions": [
|
| 373 |
}
|
| 374 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 375 |
規則:
|
| 376 |
- 每個項目必須是完整、獨立的句子
|
| 377 |
- 在每個項目中包含上下文(誰、什麼、何時)
|
|
@@ -391,12 +406,17 @@ The transcript may contain repetitions, noise, or incomplete sentences - focus o
|
|
| 391 |
|
| 392 |
After reasoning, return data as a JSON object with the following schema:
|
| 393 |
{
|
| 394 |
-
"action_items": [
|
| 395 |
-
"decisions": [
|
| 396 |
-
"key_points": [
|
| 397 |
-
"open_questions": [
|
| 398 |
}
|
| 399 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 400 |
Rules:
|
| 401 |
- Each item must be a complete, standalone sentence
|
| 402 |
- Include context (who, what, when) in each item
|
|
|
|
| 330 |
return """以 JSON 格式返回資料,使用以下架構:
|
| 331 |
|
| 332 |
{
|
| 333 |
+
"action_items": [],
|
| 334 |
+
"decisions": [],
|
| 335 |
+
"key_points": [],
|
| 336 |
+
"open_questions": []
|
| 337 |
}
|
| 338 |
|
| 339 |
+
action_items: 包含負責人和截止日期的具體行動項目
|
| 340 |
+
decisions: 包合理由的決策
|
| 341 |
+
key_points: 重要討論要點
|
| 342 |
+
open_questions: 未解決的問題或疑慮
|
| 343 |
+
|
| 344 |
從使用者提供的逐字稿中提取。逐字稿可能包含重複、雜訊或不完整內容,請專注於有意義的對話內容,忽略重複的詞句。"""
|
| 345 |
else:
|
| 346 |
return """Return data as a JSON object with the following schema:
|
| 347 |
|
| 348 |
{
|
| 349 |
+
"action_items": [],
|
| 350 |
+
"decisions": [],
|
| 351 |
+
"key_points": [],
|
| 352 |
+
"open_questions": []
|
| 353 |
}
|
| 354 |
|
| 355 |
+
action_items: Specific action items with owner and deadline
|
| 356 |
+
decisions: Decisions made with rationale
|
| 357 |
+
key_points: Important discussion points
|
| 358 |
+
open_questions: Unresolved questions or concerns
|
| 359 |
+
|
| 360 |
Extract from the transcript provided by the user. The transcript may contain repetitions, noise, or incomplete sentences - focus on meaningful dialogue content and ignore repetitive phrases."""
|
| 361 |
|
| 362 |
|
|
|
|
| 376 |
|
| 377 |
推理後,以 JSON 格式返回資料,使用以下架構:
|
| 378 |
{
|
| 379 |
+
"action_items": [],
|
| 380 |
+
"decisions": [],
|
| 381 |
+
"key_points": [],
|
| 382 |
+
"open_questions": []
|
| 383 |
}
|
| 384 |
|
| 385 |
+
action_items: 包含負責人和截止日期的具體行動項目
|
| 386 |
+
decisions: 包合理由的決策
|
| 387 |
+
key_points: 重要討論要點
|
| 388 |
+
open_questions: 未解決的問題或疑慮
|
| 389 |
+
|
| 390 |
規則:
|
| 391 |
- 每個項目必須是完整、獨立的句子
|
| 392 |
- 在每個項目中包含上下文(誰、什麼、何時)
|
|
|
|
| 406 |
|
| 407 |
After reasoning, return data as a JSON object with the following schema:
|
| 408 |
{
|
| 409 |
+
"action_items": [],
|
| 410 |
+
"decisions": [],
|
| 411 |
+
"key_points": [],
|
| 412 |
+
"open_questions": []
|
| 413 |
}
|
| 414 |
|
| 415 |
+
action_items: Specific action items with owner and deadline
|
| 416 |
+
decisions: Decisions made with rationale
|
| 417 |
+
key_points: Important discussion points
|
| 418 |
+
open_questions: Unresolved questions or concerns
|
| 419 |
+
|
| 420 |
Rules:
|
| 421 |
- Each item must be a complete, standalone sentence
|
| 422 |
- Include context (who, what, when) in each item
|