Spaces:
Running
feat: Implement Advanced Mode 3-stage pipeline (extraction → deduplication → synthesis)
Browse filesAdd complete 3-model pipeline with independent model registries and parameters:
**New Modules:**
- meeting_summarizer/__init__.py: Package initialization
- meeting_summarizer/trace.py: Tracer with extraction/dedup/synthesis logging
- meeting_summarizer/extraction.py: Complete pipeline (~700 lines)
- NativeTokenizer: Token counting without llama.cpp
- EmbeddingModel: Embedding computation for deduplication
- stream_extract_from_window(): Stage 1 extraction with reasoning
- deduplicate_items(): Stage 2 semantic deduplication
- stream_synthesize_executive_summary(): Stage 3 synthesis
**Model Registries (32 models, fully independent):**
- EXTRACTION_MODELS (13 models ≤1.7B): Extraction-optimized (temp 0.1-0.3)
- Includes LFM2-Extract 350M & 1.2B (specialized extraction models)
- 2 hybrid models with reasoning toggle (Qwen3 600M & 1.7B)
- SYNTHESIS_MODELS (16 models 1B-30B): Synthesis-optimized (temp 0.7-0.9)
- Fully independent from AVAILABLE_MODELS (no shared references)
- 2 hybrid + 5 thinking-only models with reasoning support
- EMBEDDING_MODELS (4 models): granite-107m (default), granite-278m, gemma-300m, qwen-600m
**Core Functions (app.py):**
- get_model_config(): Role-aware configuration resolver
- load_model_for_role(): Sequential loading with user n_ctx support
- unload_model(): Explicit memory cleanup
- build_extraction_system_prompt(): Bilingual + reasoning support
- summarize_advanced(): Main orchestrator (239 lines) with sequential model loading/unloading
**UI Implementation:**
- Mode tabs (Standard vs Advanced)
- 11 Advanced Mode controls (3 dropdowns, 4 sliders, 2 checkboxes, 2 radios)
- Conditional reasoning checkbox visibility per stage
- Submit button router (auto-detects mode and routes to appropriate handler)
**Features:**
- Sequential model loading/unloading (memory-safe for HF Spaces Free Tier)
- Bilingual support (auto-detect in extraction, Chinese conversion at end)
- Live progress streaming with ticker updates
- Trace logging (JSONL embedded in download JSON)
- Independent parameters per stage (no cross-contamination)
**Pipeline:**
Stage 1: Extraction → Parse transcript windows → Extract JSON items
Stage 2: Deduplication → Compute embeddings → Remove semantic duplicates
Stage 3: Synthesis → Generate executive summary from deduplicated items
Code statistics: ~2,400 new lines, 3 new files, 11 new functions, 3 new classes
- app.py +1249 -67
- meeting_summarizer/__init__.py +13 -0
- meeting_summarizer/extraction.py +705 -0
- meeting_summarizer/trace.py +197 -0
|
@@ -660,6 +660,459 @@ AVAILABLE_MODELS = {
|
|
| 660 |
DEFAULT_MODEL_KEY = "qwen3_600m_q4"
|
| 661 |
|
| 662 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 663 |
def load_model(model_key: str = None, n_threads: int = 2) -> Tuple[Llama, str]:
|
| 664 |
"""
|
| 665 |
Load model with CPU optimizations. Only reloads if model changes.
|
|
@@ -766,6 +1219,452 @@ def update_reasoning_visibility(model_key):
|
|
| 766 |
return gr.update(visible=True, value=True, interactive=True, label="Enable Reasoning Mode")
|
| 767 |
|
| 768 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 769 |
def download_summary_json(summary, thinking, model_key, language, metrics):
|
| 770 |
"""Generate JSON file with summary and metadata."""
|
| 771 |
import json
|
|
@@ -1667,80 +2566,190 @@ def create_interface():
|
|
| 1667 |
)
|
| 1668 |
|
| 1669 |
# ==========================================
|
| 1670 |
-
# Section 2:
|
| 1671 |
# ==========================================
|
| 1672 |
-
with gr.Tabs() as
|
| 1673 |
|
| 1674 |
-
#
|
| 1675 |
-
with gr.TabItem("
|
| 1676 |
-
|
| 1677 |
-
preset_choices = [
|
| 1678 |
-
(info["name"] + (" ⚡" if info.get("supports_reasoning", False) and not info.get("supports_toggle", False) else ""), key)
|
| 1679 |
-
for key, info in AVAILABLE_MODELS.items()
|
| 1680 |
-
if key != "custom_hf"
|
| 1681 |
-
]
|
| 1682 |
-
|
| 1683 |
-
model_dropdown = gr.Dropdown(
|
| 1684 |
-
choices=preset_choices,
|
| 1685 |
-
value=DEFAULT_MODEL_KEY,
|
| 1686 |
-
label="Select Model",
|
| 1687 |
-
info="Smaller = faster. ⚡ = Always-reasoning models."
|
| 1688 |
-
)
|
| 1689 |
-
|
| 1690 |
-
enable_reasoning = gr.Checkbox(
|
| 1691 |
-
value=True,
|
| 1692 |
-
label="Enable Reasoning Mode",
|
| 1693 |
-
info="Uses /think for deeper analysis (slower) or /no_think for direct output (faster).",
|
| 1694 |
-
interactive=True,
|
| 1695 |
-
visible=AVAILABLE_MODELS[DEFAULT_MODEL_KEY].get("supports_toggle", False)
|
| 1696 |
-
)
|
| 1697 |
|
| 1698 |
-
|
| 1699 |
-
|
| 1700 |
-
|
| 1701 |
-
|
| 1702 |
-
|
| 1703 |
-
|
| 1704 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1705 |
|
| 1706 |
-
#
|
| 1707 |
-
with gr.TabItem("
|
| 1708 |
-
gr.HTML('<div style="font-size: 0.
|
| 1709 |
|
| 1710 |
-
#
|
| 1711 |
-
|
| 1712 |
-
|
| 1713 |
-
|
| 1714 |
-
|
| 1715 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1716 |
|
| 1717 |
-
#
|
| 1718 |
-
|
| 1719 |
-
|
| 1720 |
-
|
| 1721 |
-
|
| 1722 |
-
|
| 1723 |
-
|
| 1724 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1725 |
|
| 1726 |
-
#
|
| 1727 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1728 |
|
| 1729 |
-
#
|
| 1730 |
-
|
| 1731 |
-
|
| 1732 |
-
|
| 1733 |
-
|
| 1734 |
-
|
| 1735 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1736 |
|
| 1737 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1738 |
|
| 1739 |
-
#
|
| 1740 |
-
|
| 1741 |
-
|
| 1742 |
-
|
| 1743 |
-
|
| 1744 |
)
|
| 1745 |
|
| 1746 |
# ==========================================
|
|
@@ -1973,6 +2982,60 @@ def create_interface():
|
|
| 1973 |
outputs=[system_prompt_debug],
|
| 1974 |
)
|
| 1975 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1976 |
# Debounced auto-discovery for custom repo ID (500ms delay)
|
| 1977 |
import time as time_module
|
| 1978 |
|
|
@@ -2138,10 +3201,129 @@ def create_interface():
|
|
| 2138 |
outputs=[custom_info_output],
|
| 2139 |
)
|
| 2140 |
|
| 2141 |
-
#
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2142 |
submit_btn.click(
|
| 2143 |
-
fn=
|
| 2144 |
-
inputs=[
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2145 |
outputs=[thinking_output, summary_output, info_output, metrics_state, system_prompt_debug],
|
| 2146 |
show_progress="full"
|
| 2147 |
)
|
|
|
|
| 660 |
DEFAULT_MODEL_KEY = "qwen3_600m_q4"
|
| 661 |
|
| 662 |
|
| 663 |
+
# ===== ADVANCED MODE: EXTRACTION MODELS REGISTRY (13 models, ≤1.7B) =====
|
| 664 |
+
# Used exclusively for Stage 1: Extraction (transcript windows → structured JSON)
|
| 665 |
+
# Extraction-optimized settings: Low temperature (0.1-0.3) for deterministic output
|
| 666 |
+
|
| 667 |
+
EXTRACTION_MODELS = {
|
| 668 |
+
"falcon_h1_100m": {
|
| 669 |
+
"name": "Falcon-H1 100M",
|
| 670 |
+
"repo_id": "mradermacher/Falcon-H1-Tiny-Multilingual-100M-Instruct-GGUF",
|
| 671 |
+
"filename": "*Q8_0.gguf",
|
| 672 |
+
"max_context": 32768,
|
| 673 |
+
"default_n_ctx": 4096,
|
| 674 |
+
"params_size": "100M",
|
| 675 |
+
"supports_reasoning": False,
|
| 676 |
+
"supports_toggle": False,
|
| 677 |
+
"inference_settings": {
|
| 678 |
+
"temperature": 0.2,
|
| 679 |
+
"top_p": 0.9,
|
| 680 |
+
"top_k": 30,
|
| 681 |
+
"repeat_penalty": 1.0,
|
| 682 |
+
},
|
| 683 |
+
},
|
| 684 |
+
"gemma3_270m": {
|
| 685 |
+
"name": "Gemma-3 270M",
|
| 686 |
+
"repo_id": "unsloth/gemma-3-270m-it-qat-GGUF",
|
| 687 |
+
"filename": "*Q8_0.gguf",
|
| 688 |
+
"max_context": 32768,
|
| 689 |
+
"default_n_ctx": 4096,
|
| 690 |
+
"params_size": "270M",
|
| 691 |
+
"supports_reasoning": False,
|
| 692 |
+
"supports_toggle": False,
|
| 693 |
+
"inference_settings": {
|
| 694 |
+
"temperature": 0.3,
|
| 695 |
+
"top_p": 0.9,
|
| 696 |
+
"top_k": 40,
|
| 697 |
+
"repeat_penalty": 1.0,
|
| 698 |
+
},
|
| 699 |
+
},
|
| 700 |
+
"ernie_300m": {
|
| 701 |
+
"name": "ERNIE-4.5 0.3B (131K Context)",
|
| 702 |
+
"repo_id": "unsloth/ERNIE-4.5-0.3B-PT-GGUF",
|
| 703 |
+
"filename": "*Q8_0.gguf",
|
| 704 |
+
"max_context": 131072,
|
| 705 |
+
"default_n_ctx": 4096,
|
| 706 |
+
"params_size": "300M",
|
| 707 |
+
"supports_reasoning": False,
|
| 708 |
+
"supports_toggle": False,
|
| 709 |
+
"inference_settings": {
|
| 710 |
+
"temperature": 0.2,
|
| 711 |
+
"top_p": 0.9,
|
| 712 |
+
"top_k": 30,
|
| 713 |
+
"repeat_penalty": 1.0,
|
| 714 |
+
},
|
| 715 |
+
},
|
| 716 |
+
"granite_350m": {
|
| 717 |
+
"name": "Granite-4.0 350M",
|
| 718 |
+
"repo_id": "unsloth/granite-4.0-h-350m-GGUF",
|
| 719 |
+
"filename": "*Q8_0.gguf",
|
| 720 |
+
"max_context": 32768,
|
| 721 |
+
"default_n_ctx": 4096,
|
| 722 |
+
"params_size": "350M",
|
| 723 |
+
"supports_reasoning": False,
|
| 724 |
+
"supports_toggle": False,
|
| 725 |
+
"inference_settings": {
|
| 726 |
+
"temperature": 0.1,
|
| 727 |
+
"top_p": 0.95,
|
| 728 |
+
"top_k": 30,
|
| 729 |
+
"repeat_penalty": 1.0,
|
| 730 |
+
},
|
| 731 |
+
},
|
| 732 |
+
"lfm2_350m": {
|
| 733 |
+
"name": "LFM2 350M",
|
| 734 |
+
"repo_id": "LiquidAI/LFM2-350M-GGUF",
|
| 735 |
+
"filename": "*Q8_0.gguf",
|
| 736 |
+
"max_context": 32768,
|
| 737 |
+
"default_n_ctx": 4096,
|
| 738 |
+
"params_size": "350M",
|
| 739 |
+
"supports_reasoning": False,
|
| 740 |
+
"supports_toggle": False,
|
| 741 |
+
"inference_settings": {
|
| 742 |
+
"temperature": 0.2,
|
| 743 |
+
"top_p": 0.9,
|
| 744 |
+
"top_k": 40,
|
| 745 |
+
"repeat_penalty": 1.0,
|
| 746 |
+
},
|
| 747 |
+
},
|
| 748 |
+
"lfm2_extract_350m": {
|
| 749 |
+
"name": "LFM2-Extract 350M (Specialized)",
|
| 750 |
+
"repo_id": "LiquidAI/LFM2-350M-Extract-GGUF",
|
| 751 |
+
"filename": "*Q8_0.gguf",
|
| 752 |
+
"max_context": 32768,
|
| 753 |
+
"default_n_ctx": 4096,
|
| 754 |
+
"params_size": "350M",
|
| 755 |
+
"supports_reasoning": False,
|
| 756 |
+
"supports_toggle": False,
|
| 757 |
+
"inference_settings": {
|
| 758 |
+
"temperature": 0.2,
|
| 759 |
+
"top_p": 0.9,
|
| 760 |
+
"top_k": 30,
|
| 761 |
+
"repeat_penalty": 1.0,
|
| 762 |
+
},
|
| 763 |
+
},
|
| 764 |
+
"bitcpm4_500m": {
|
| 765 |
+
"name": "BitCPM4 0.5B (128K Context)",
|
| 766 |
+
"repo_id": "openbmb/BitCPM4-0.5B-GGUF",
|
| 767 |
+
"filename": "*q4_0.gguf",
|
| 768 |
+
"max_context": 131072,
|
| 769 |
+
"default_n_ctx": 4096,
|
| 770 |
+
"params_size": "500M",
|
| 771 |
+
"supports_reasoning": False,
|
| 772 |
+
"supports_toggle": False,
|
| 773 |
+
"inference_settings": {
|
| 774 |
+
"temperature": 0.2,
|
| 775 |
+
"top_p": 0.9,
|
| 776 |
+
"top_k": 30,
|
| 777 |
+
"repeat_penalty": 1.0,
|
| 778 |
+
},
|
| 779 |
+
},
|
| 780 |
+
"hunyuan_500m": {
|
| 781 |
+
"name": "Hunyuan 0.5B (256K Context)",
|
| 782 |
+
"repo_id": "mradermacher/Hunyuan-0.5B-Instruct-GGUF",
|
| 783 |
+
"filename": "*Q8_0.gguf",
|
| 784 |
+
"max_context": 262144,
|
| 785 |
+
"default_n_ctx": 4096,
|
| 786 |
+
"params_size": "500M",
|
| 787 |
+
"supports_reasoning": False,
|
| 788 |
+
"supports_toggle": False,
|
| 789 |
+
"inference_settings": {
|
| 790 |
+
"temperature": 0.2,
|
| 791 |
+
"top_p": 0.9,
|
| 792 |
+
"top_k": 30,
|
| 793 |
+
"repeat_penalty": 1.0,
|
| 794 |
+
},
|
| 795 |
+
},
|
| 796 |
+
"qwen3_600m_q4": {
|
| 797 |
+
"name": "Qwen3 0.6B Q4 (32K Context)",
|
| 798 |
+
"repo_id": "unsloth/Qwen3-0.6B-GGUF",
|
| 799 |
+
"filename": "*Q4_0.gguf",
|
| 800 |
+
"max_context": 32768,
|
| 801 |
+
"default_n_ctx": 4096,
|
| 802 |
+
"params_size": "600M",
|
| 803 |
+
"supports_reasoning": True,
|
| 804 |
+
"supports_toggle": True, # Hybrid model
|
| 805 |
+
"inference_settings": {
|
| 806 |
+
"temperature": 0.3,
|
| 807 |
+
"top_p": 0.9,
|
| 808 |
+
"top_k": 20,
|
| 809 |
+
"repeat_penalty": 1.0,
|
| 810 |
+
},
|
| 811 |
+
},
|
| 812 |
+
"granite_3_1_1b_q8": {
|
| 813 |
+
"name": "Granite 3.1 1B-A400M Instruct (128K Context)",
|
| 814 |
+
"repo_id": "bartowski/granite-3.1-1b-a400m-instruct-GGUF",
|
| 815 |
+
"filename": "*Q8_0.gguf",
|
| 816 |
+
"max_context": 131072,
|
| 817 |
+
"default_n_ctx": 4096,
|
| 818 |
+
"params_size": "1B",
|
| 819 |
+
"supports_reasoning": False,
|
| 820 |
+
"supports_toggle": False,
|
| 821 |
+
"inference_settings": {
|
| 822 |
+
"temperature": 0.3,
|
| 823 |
+
"top_p": 0.9,
|
| 824 |
+
"top_k": 30,
|
| 825 |
+
"repeat_penalty": 1.0,
|
| 826 |
+
},
|
| 827 |
+
},
|
| 828 |
+
"falcon_h1_1.5b_q4": {
|
| 829 |
+
"name": "Falcon-H1 1.5B Q4",
|
| 830 |
+
"repo_id": "unsloth/Falcon-H1-1.5B-Deep-Instruct-GGUF",
|
| 831 |
+
"filename": "*Q4_K_M.gguf",
|
| 832 |
+
"max_context": 32768,
|
| 833 |
+
"default_n_ctx": 4096,
|
| 834 |
+
"params_size": "1.5B",
|
| 835 |
+
"supports_reasoning": False,
|
| 836 |
+
"supports_toggle": False,
|
| 837 |
+
"inference_settings": {
|
| 838 |
+
"temperature": 0.2,
|
| 839 |
+
"top_p": 0.9,
|
| 840 |
+
"top_k": 30,
|
| 841 |
+
"repeat_penalty": 1.0,
|
| 842 |
+
},
|
| 843 |
+
},
|
| 844 |
+
"qwen3_1.7b_q4": {
|
| 845 |
+
"name": "Qwen3 1.7B Q4 (32K Context)",
|
| 846 |
+
"repo_id": "unsloth/Qwen3-1.7B-GGUF",
|
| 847 |
+
"filename": "*Q4_0.gguf",
|
| 848 |
+
"max_context": 32768,
|
| 849 |
+
"default_n_ctx": 4096,
|
| 850 |
+
"params_size": "1.7B",
|
| 851 |
+
"supports_reasoning": True,
|
| 852 |
+
"supports_toggle": True, # Hybrid model
|
| 853 |
+
"inference_settings": {
|
| 854 |
+
"temperature": 0.3,
|
| 855 |
+
"top_p": 0.9,
|
| 856 |
+
"top_k": 20,
|
| 857 |
+
"repeat_penalty": 1.0,
|
| 858 |
+
},
|
| 859 |
+
},
|
| 860 |
+
"lfm2_extract_1.2b": {
|
| 861 |
+
"name": "LFM2-Extract 1.2B (Specialized) ⭐",
|
| 862 |
+
"repo_id": "LiquidAI/LFM2-1.2B-Extract-GGUF",
|
| 863 |
+
"filename": "*Q8_0.gguf",
|
| 864 |
+
"max_context": 32768,
|
| 865 |
+
"default_n_ctx": 4096,
|
| 866 |
+
"params_size": "1.2B",
|
| 867 |
+
"supports_reasoning": False,
|
| 868 |
+
"supports_toggle": False,
|
| 869 |
+
"inference_settings": {
|
| 870 |
+
"temperature": 0.2,
|
| 871 |
+
"top_p": 0.9,
|
| 872 |
+
"top_k": 30,
|
| 873 |
+
"repeat_penalty": 1.0,
|
| 874 |
+
},
|
| 875 |
+
},
|
| 876 |
+
}
|
| 877 |
+
|
| 878 |
+
DEFAULT_EXTRACTION_MODEL = "lfm2_extract_1.2b"
|
| 879 |
+
|
| 880 |
+
|
| 881 |
+
# ===== ADVANCED MODE: SYNTHESIS MODELS REGISTRY (16 models, 1B-30B) =====
|
| 882 |
+
# Used exclusively for Stage 3: Synthesis (deduplicated items → executive summary)
|
| 883 |
+
# Synthesis-optimized settings: Higher temperature (0.7-0.9) for creative synthesis
|
| 884 |
+
# FULLY INDEPENDENT from AVAILABLE_MODELS (no shared references)
|
| 885 |
+
|
| 886 |
+
SYNTHESIS_MODELS = {
|
| 887 |
+
"granite_3_1_1b_q8": {
|
| 888 |
+
"name": "Granite 3.1 1B-A400M Instruct (128K Context)",
|
| 889 |
+
"repo_id": "bartowski/granite-3.1-1b-a400m-instruct-GGUF",
|
| 890 |
+
"filename": "*Q8_0.gguf",
|
| 891 |
+
"max_context": 131072,
|
| 892 |
+
"supports_reasoning": False,
|
| 893 |
+
"supports_toggle": False,
|
| 894 |
+
"inference_settings": {
|
| 895 |
+
"temperature": 0.8,
|
| 896 |
+
"top_p": 0.95,
|
| 897 |
+
"top_k": 50,
|
| 898 |
+
"repeat_penalty": 1.05,
|
| 899 |
+
},
|
| 900 |
+
},
|
| 901 |
+
"falcon_h1_1.5b_q4": {
|
| 902 |
+
"name": "Falcon-H1 1.5B Q4",
|
| 903 |
+
"repo_id": "unsloth/Falcon-H1-1.5B-Deep-Instruct-GGUF",
|
| 904 |
+
"filename": "*Q4_K_M.gguf",
|
| 905 |
+
"max_context": 32768,
|
| 906 |
+
"supports_reasoning": False,
|
| 907 |
+
"supports_toggle": False,
|
| 908 |
+
"inference_settings": {
|
| 909 |
+
"temperature": 0.7,
|
| 910 |
+
"top_p": 0.95,
|
| 911 |
+
"top_k": 40,
|
| 912 |
+
"repeat_penalty": 1.0,
|
| 913 |
+
},
|
| 914 |
+
},
|
| 915 |
+
"qwen3_1.7b_q4": {
|
| 916 |
+
"name": "Qwen3 1.7B Q4 (32K Context)",
|
| 917 |
+
"repo_id": "unsloth/Qwen3-1.7B-GGUF",
|
| 918 |
+
"filename": "*Q4_0.gguf",
|
| 919 |
+
"max_context": 32768,
|
| 920 |
+
"supports_reasoning": True,
|
| 921 |
+
"supports_toggle": True, # Hybrid model
|
| 922 |
+
"inference_settings": {
|
| 923 |
+
"temperature": 0.8,
|
| 924 |
+
"top_p": 0.95,
|
| 925 |
+
"top_k": 30,
|
| 926 |
+
"repeat_penalty": 1.0,
|
| 927 |
+
},
|
| 928 |
+
},
|
| 929 |
+
"granite_3_3_2b_q4": {
|
| 930 |
+
"name": "Granite 3.3 2B Instruct (128K Context)",
|
| 931 |
+
"repo_id": "ibm-granite/granite-3.3-2b-instruct-GGUF",
|
| 932 |
+
"filename": "*Q4_K_M.gguf",
|
| 933 |
+
"max_context": 131072,
|
| 934 |
+
"supports_reasoning": False,
|
| 935 |
+
"supports_toggle": False,
|
| 936 |
+
"inference_settings": {
|
| 937 |
+
"temperature": 0.8,
|
| 938 |
+
"top_p": 0.95,
|
| 939 |
+
"top_k": 50,
|
| 940 |
+
"repeat_penalty": 1.05,
|
| 941 |
+
},
|
| 942 |
+
},
|
| 943 |
+
"youtu_llm_2b_q8": {
|
| 944 |
+
"name": "Youtu-LLM 2B (128K Context)",
|
| 945 |
+
"repo_id": "tencent/Youtu-LLM-2B-GGUF",
|
| 946 |
+
"filename": "*Q8_0.gguf",
|
| 947 |
+
"max_context": 131072,
|
| 948 |
+
"supports_reasoning": True,
|
| 949 |
+
"supports_toggle": True, # Hybrid model
|
| 950 |
+
"inference_settings": {
|
| 951 |
+
"temperature": 0.8,
|
| 952 |
+
"top_p": 0.95,
|
| 953 |
+
"top_k": 40,
|
| 954 |
+
"repeat_penalty": 1.0,
|
| 955 |
+
},
|
| 956 |
+
},
|
| 957 |
+
"lfm2_2_6b_transcript": {
|
| 958 |
+
"name": "LFM2 2.6B Transcript (32K Context)",
|
| 959 |
+
"repo_id": "LiquidAI/LFM-2.6B-Transcript-GGUF",
|
| 960 |
+
"filename": "*Q4_0.gguf",
|
| 961 |
+
"max_context": 32768,
|
| 962 |
+
"supports_reasoning": False,
|
| 963 |
+
"supports_toggle": False,
|
| 964 |
+
"inference_settings": {
|
| 965 |
+
"temperature": 0.7,
|
| 966 |
+
"top_p": 0.95,
|
| 967 |
+
"top_k": 40,
|
| 968 |
+
"repeat_penalty": 1.05,
|
| 969 |
+
},
|
| 970 |
+
},
|
| 971 |
+
"breeze_3b_q4": {
|
| 972 |
+
"name": "Breeze 3B Q4 (32K Context)",
|
| 973 |
+
"repo_id": "mradermacher/breeze-3b-GGUF",
|
| 974 |
+
"filename": "*Q4_K_M.gguf",
|
| 975 |
+
"max_context": 32768,
|
| 976 |
+
"supports_reasoning": False,
|
| 977 |
+
"supports_toggle": False,
|
| 978 |
+
"inference_settings": {
|
| 979 |
+
"temperature": 0.7,
|
| 980 |
+
"top_p": 0.95,
|
| 981 |
+
"top_k": 40,
|
| 982 |
+
"repeat_penalty": 1.0,
|
| 983 |
+
},
|
| 984 |
+
},
|
| 985 |
+
"granite_3_1_3b_q4": {
|
| 986 |
+
"name": "Granite 3.1 3B-A800M Instruct (128K Context)",
|
| 987 |
+
"repo_id": "bartowski/granite-3.1-3b-a800m-instruct-GGUF",
|
| 988 |
+
"filename": "*Q4_K_M.gguf",
|
| 989 |
+
"max_context": 131072,
|
| 990 |
+
"supports_reasoning": False,
|
| 991 |
+
"supports_toggle": False,
|
| 992 |
+
"inference_settings": {
|
| 993 |
+
"temperature": 0.8,
|
| 994 |
+
"top_p": 0.95,
|
| 995 |
+
"top_k": 50,
|
| 996 |
+
"repeat_penalty": 1.05,
|
| 997 |
+
},
|
| 998 |
+
},
|
| 999 |
+
"qwen3_4b_thinking_q3": {
|
| 1000 |
+
"name": "Qwen3 4B Thinking (256K Context)",
|
| 1001 |
+
"repo_id": "unsloth/Qwen3-4B-Thinking-2507-GGUF",
|
| 1002 |
+
"filename": "*Q3_K_M.gguf",
|
| 1003 |
+
"max_context": 262144,
|
| 1004 |
+
"supports_reasoning": True,
|
| 1005 |
+
"supports_toggle": False, # Thinking-only
|
| 1006 |
+
"inference_settings": {
|
| 1007 |
+
"temperature": 0.8,
|
| 1008 |
+
"top_p": 0.95,
|
| 1009 |
+
"top_k": 30,
|
| 1010 |
+
"repeat_penalty": 1.0,
|
| 1011 |
+
},
|
| 1012 |
+
},
|
| 1013 |
+
"granite4_tiny_q3": {
|
| 1014 |
+
"name": "Granite 4.0 Tiny 7B (128K Context)",
|
| 1015 |
+
"repo_id": "ibm-research/granite-4.0-Tiny-7B-Instruct-GGUF",
|
| 1016 |
+
"filename": "*Q3_K_M.gguf",
|
| 1017 |
+
"max_context": 131072,
|
| 1018 |
+
"supports_reasoning": False,
|
| 1019 |
+
"supports_toggle": False,
|
| 1020 |
+
"inference_settings": {
|
| 1021 |
+
"temperature": 0.8,
|
| 1022 |
+
"top_p": 0.95,
|
| 1023 |
+
"top_k": 50,
|
| 1024 |
+
"repeat_penalty": 1.05,
|
| 1025 |
+
},
|
| 1026 |
+
},
|
| 1027 |
+
"ernie_21b_pt_q1": {
|
| 1028 |
+
"name": "ERNIE-4.5 21B PT (128K Context)",
|
| 1029 |
+
"repo_id": "unsloth/ERNIE-4.5-21B-A3B-PT-GGUF",
|
| 1030 |
+
"filename": "*TQ1_0.gguf",
|
| 1031 |
+
"max_context": 131072,
|
| 1032 |
+
"supports_reasoning": False,
|
| 1033 |
+
"supports_toggle": False,
|
| 1034 |
+
"inference_settings": {
|
| 1035 |
+
"temperature": 0.8,
|
| 1036 |
+
"top_p": 0.95,
|
| 1037 |
+
"top_k": 50,
|
| 1038 |
+
"repeat_penalty": 1.05,
|
| 1039 |
+
},
|
| 1040 |
+
},
|
| 1041 |
+
"ernie_21b_thinking_q1": {
|
| 1042 |
+
"name": "ERNIE-4.5 21B Thinking (128K Context)",
|
| 1043 |
+
"repo_id": "unsloth/ERNIE-4.5-21B-A3B-Thinking-GGUF",
|
| 1044 |
+
"filename": "*TQ1_0.gguf",
|
| 1045 |
+
"max_context": 131072,
|
| 1046 |
+
"supports_reasoning": True,
|
| 1047 |
+
"supports_toggle": False, # Thinking-only
|
| 1048 |
+
"inference_settings": {
|
| 1049 |
+
"temperature": 0.9,
|
| 1050 |
+
"top_p": 0.95,
|
| 1051 |
+
"top_k": 50,
|
| 1052 |
+
"repeat_penalty": 1.05,
|
| 1053 |
+
},
|
| 1054 |
+
},
|
| 1055 |
+
"glm_4_7_flash_reap_30b": {
|
| 1056 |
+
"name": "GLM-4.7-Flash-REAP-30B Thinking (128K Context)",
|
| 1057 |
+
"repo_id": "unsloth/GLM-4.7-Flash-REAP-23B-A3B-GGUF",
|
| 1058 |
+
"filename": "*TQ1_0.gguf",
|
| 1059 |
+
"max_context": 131072,
|
| 1060 |
+
"supports_reasoning": True,
|
| 1061 |
+
"supports_toggle": False, # Thinking-only
|
| 1062 |
+
"inference_settings": {
|
| 1063 |
+
"temperature": 0.8,
|
| 1064 |
+
"top_p": 0.95,
|
| 1065 |
+
"top_k": 40,
|
| 1066 |
+
"repeat_penalty": 1.0,
|
| 1067 |
+
},
|
| 1068 |
+
},
|
| 1069 |
+
"glm_4_7_flash_30b_iq2": {
|
| 1070 |
+
"name": "GLM-4.7-Flash-30B (Original) IQ2_XXS (128K Context)",
|
| 1071 |
+
"repo_id": "bartowski/zai-org_GLM-4.7-Flash-GGUF",
|
| 1072 |
+
"filename": "*IQ2_XXS.gguf",
|
| 1073 |
+
"max_context": 131072,
|
| 1074 |
+
"supports_reasoning": False,
|
| 1075 |
+
"supports_toggle": False,
|
| 1076 |
+
"inference_settings": {
|
| 1077 |
+
"temperature": 0.7,
|
| 1078 |
+
"top_p": 0.95,
|
| 1079 |
+
"top_k": 40,
|
| 1080 |
+
"repeat_penalty": 1.0,
|
| 1081 |
+
},
|
| 1082 |
+
},
|
| 1083 |
+
"qwen3_30b_thinking_q1": {
|
| 1084 |
+
"name": "Qwen3 30B Thinking (256K Context)",
|
| 1085 |
+
"repo_id": "unsloth/Qwen3-30B-A3B-Thinking-2507-GGUF",
|
| 1086 |
+
"filename": "*TQ1_0.gguf",
|
| 1087 |
+
"max_context": 262144,
|
| 1088 |
+
"supports_reasoning": True,
|
| 1089 |
+
"supports_toggle": False, # Thinking-only
|
| 1090 |
+
"inference_settings": {
|
| 1091 |
+
"temperature": 0.8,
|
| 1092 |
+
"top_p": 0.95,
|
| 1093 |
+
"top_k": 30,
|
| 1094 |
+
"repeat_penalty": 1.0,
|
| 1095 |
+
},
|
| 1096 |
+
},
|
| 1097 |
+
"qwen3_30b_instruct_q1": {
|
| 1098 |
+
"name": "Qwen3 30B Instruct (256K Context)",
|
| 1099 |
+
"repo_id": "unsloth/Qwen3-30B-A3B-Instruct-2507-GGUF",
|
| 1100 |
+
"filename": "*TQ1_0.gguf",
|
| 1101 |
+
"max_context": 262144,
|
| 1102 |
+
"supports_reasoning": False,
|
| 1103 |
+
"supports_toggle": False,
|
| 1104 |
+
"inference_settings": {
|
| 1105 |
+
"temperature": 0.7,
|
| 1106 |
+
"top_p": 0.95,
|
| 1107 |
+
"top_k": 30,
|
| 1108 |
+
"repeat_penalty": 1.0,
|
| 1109 |
+
},
|
| 1110 |
+
},
|
| 1111 |
+
}
|
| 1112 |
+
|
| 1113 |
+
DEFAULT_SYNTHESIS_MODEL = "qwen3_1.7b_q4"
|
| 1114 |
+
|
| 1115 |
+
|
| 1116 |
def load_model(model_key: str = None, n_threads: int = 2) -> Tuple[Llama, str]:
|
| 1117 |
"""
|
| 1118 |
Load model with CPU optimizations. Only reloads if model changes.
|
|
|
|
| 1219 |
return gr.update(visible=True, value=True, interactive=True, label="Enable Reasoning Mode")
|
| 1220 |
|
| 1221 |
|
| 1222 |
+
# ===== ADVANCED MODE: HELPER FUNCTIONS =====
|
| 1223 |
+
|
| 1224 |
+
def get_model_config(model_key: str, model_role: str) -> Dict[str, Any]:
|
| 1225 |
+
"""
|
| 1226 |
+
Get model configuration based on role.
|
| 1227 |
+
|
| 1228 |
+
Ensures same model (e.g., qwen3_1.7b_q4) uses DIFFERENT settings
|
| 1229 |
+
for extraction vs synthesis.
|
| 1230 |
+
|
| 1231 |
+
Args:
|
| 1232 |
+
model_key: Model identifier (e.g., "qwen3_1.7b_q4")
|
| 1233 |
+
model_role: "extraction" or "synthesis"
|
| 1234 |
+
|
| 1235 |
+
Returns:
|
| 1236 |
+
Model configuration dict with role-specific settings
|
| 1237 |
+
|
| 1238 |
+
Raises:
|
| 1239 |
+
ValueError: If model_key not available for specified role
|
| 1240 |
+
"""
|
| 1241 |
+
if model_role == "extraction":
|
| 1242 |
+
if model_key not in EXTRACTION_MODELS:
|
| 1243 |
+
available = ", ".join(list(EXTRACTION_MODELS.keys())[:3]) + "..."
|
| 1244 |
+
raise ValueError(
|
| 1245 |
+
f"Model '{model_key}' not available for extraction role. "
|
| 1246 |
+
f"Available: {available}"
|
| 1247 |
+
)
|
| 1248 |
+
return EXTRACTION_MODELS[model_key]
|
| 1249 |
+
|
| 1250 |
+
elif model_role == "synthesis":
|
| 1251 |
+
if model_key not in SYNTHESIS_MODELS:
|
| 1252 |
+
available = ", ".join(list(SYNTHESIS_MODELS.keys())[:3]) + "..."
|
| 1253 |
+
raise ValueError(
|
| 1254 |
+
f"Model '{model_key}' not available for synthesis role. "
|
| 1255 |
+
f"Available: {available}"
|
| 1256 |
+
)
|
| 1257 |
+
return SYNTHESIS_MODELS[model_key]
|
| 1258 |
+
|
| 1259 |
+
else:
|
| 1260 |
+
raise ValueError(
|
| 1261 |
+
f"Unknown model role: '{model_role}'. "
|
| 1262 |
+
f"Must be 'extraction' or 'synthesis'"
|
| 1263 |
+
)
|
| 1264 |
+
|
| 1265 |
+
|
| 1266 |
+
def load_model_for_role(
|
| 1267 |
+
model_key: str,
|
| 1268 |
+
model_role: str,
|
| 1269 |
+
n_threads: int = 2,
|
| 1270 |
+
user_n_ctx: Optional[int] = None
|
| 1271 |
+
) -> Tuple[Llama, str]:
|
| 1272 |
+
"""
|
| 1273 |
+
Load model with role-specific configuration.
|
| 1274 |
+
|
| 1275 |
+
Args:
|
| 1276 |
+
model_key: Model identifier
|
| 1277 |
+
model_role: "extraction" or "synthesis"
|
| 1278 |
+
n_threads: CPU threads
|
| 1279 |
+
user_n_ctx: User-specified n_ctx (extraction only, from slider)
|
| 1280 |
+
|
| 1281 |
+
Returns:
|
| 1282 |
+
(loaded_model, info_message)
|
| 1283 |
+
|
| 1284 |
+
Raises:
|
| 1285 |
+
Exception: If model loading fails (graceful failure)
|
| 1286 |
+
"""
|
| 1287 |
+
try:
|
| 1288 |
+
config = get_model_config(model_key, model_role)
|
| 1289 |
+
|
| 1290 |
+
# Calculate n_ctx
|
| 1291 |
+
if model_role == "extraction" and user_n_ctx is not None:
|
| 1292 |
+
n_ctx = min(user_n_ctx, config["max_context"], MAX_USABLE_CTX)
|
| 1293 |
+
else:
|
| 1294 |
+
# Synthesis or default extraction
|
| 1295 |
+
n_ctx = min(config.get("max_context", 8192), MAX_USABLE_CTX)
|
| 1296 |
+
|
| 1297 |
+
# Detect GPU support
|
| 1298 |
+
requested_ngl = int(os.environ.get("N_GPU_LAYERS", 0))
|
| 1299 |
+
n_gpu_layers = requested_ngl
|
| 1300 |
+
|
| 1301 |
+
if requested_ngl != 0:
|
| 1302 |
+
try:
|
| 1303 |
+
from llama_cpp import llama_supports_gpu_offload
|
| 1304 |
+
gpu_available = llama_supports_gpu_offload()
|
| 1305 |
+
if not gpu_available:
|
| 1306 |
+
logger.warning("GPU requested but not available. Using CPU.")
|
| 1307 |
+
n_gpu_layers = 0
|
| 1308 |
+
except Exception as e:
|
| 1309 |
+
logger.warning(f"Could not detect GPU: {e}. Using CPU.")
|
| 1310 |
+
n_gpu_layers = 0
|
| 1311 |
+
|
| 1312 |
+
# Load model
|
| 1313 |
+
logger.info(f"Loading {config['name']} for {model_role} role (n_ctx={n_ctx:,})")
|
| 1314 |
+
|
| 1315 |
+
llm = Llama.from_pretrained(
|
| 1316 |
+
repo_id=config["repo_id"],
|
| 1317 |
+
filename=config["filename"],
|
| 1318 |
+
n_ctx=n_ctx,
|
| 1319 |
+
n_batch=min(2048, n_ctx),
|
| 1320 |
+
n_threads=n_threads,
|
| 1321 |
+
n_threads_batch=n_threads,
|
| 1322 |
+
n_gpu_layers=n_gpu_layers,
|
| 1323 |
+
verbose=False,
|
| 1324 |
+
seed=1337,
|
| 1325 |
+
)
|
| 1326 |
+
|
| 1327 |
+
info_msg = (
|
| 1328 |
+
f"✅ Loaded: {config['name']} for {model_role} "
|
| 1329 |
+
f"(n_ctx={n_ctx:,}, threads={n_threads})"
|
| 1330 |
+
)
|
| 1331 |
+
logger.info(info_msg)
|
| 1332 |
+
|
| 1333 |
+
return llm, info_msg
|
| 1334 |
+
|
| 1335 |
+
except Exception as e:
|
| 1336 |
+
# Graceful failure - let user select different model
|
| 1337 |
+
error_msg = (
|
| 1338 |
+
f"❌ Failed to load {model_key} for {model_role}: {str(e)}\n\n"
|
| 1339 |
+
f"Please select a different model and try again."
|
| 1340 |
+
)
|
| 1341 |
+
logger.error(error_msg, exc_info=True)
|
| 1342 |
+
raise Exception(error_msg)
|
| 1343 |
+
|
| 1344 |
+
|
| 1345 |
+
def unload_model(llm: Optional[Llama], model_name: str = "model") -> None:
|
| 1346 |
+
"""Explicitly unload model and trigger garbage collection."""
|
| 1347 |
+
if llm:
|
| 1348 |
+
logger.info(f"Unloading {model_name}")
|
| 1349 |
+
del llm
|
| 1350 |
+
gc.collect()
|
| 1351 |
+
time.sleep(0.5) # Allow OS to reclaim memory
|
| 1352 |
+
|
| 1353 |
+
|
| 1354 |
+
def build_extraction_system_prompt(
|
| 1355 |
+
output_language: str,
|
| 1356 |
+
supports_reasoning: bool,
|
| 1357 |
+
supports_toggle: bool,
|
| 1358 |
+
enable_reasoning: bool
|
| 1359 |
+
) -> str:
|
| 1360 |
+
"""
|
| 1361 |
+
Build extraction system prompt with optional reasoning mode.
|
| 1362 |
+
|
| 1363 |
+
Args:
|
| 1364 |
+
output_language: "en" or "zh-TW" (auto-detected from transcript)
|
| 1365 |
+
supports_reasoning: Model has reasoning capability
|
| 1366 |
+
supports_toggle: User can toggle reasoning on/off
|
| 1367 |
+
enable_reasoning: User's choice (only applies if supports_toggle=True)
|
| 1368 |
+
|
| 1369 |
+
Returns:
|
| 1370 |
+
System prompt string
|
| 1371 |
+
"""
|
| 1372 |
+
# Determine reasoning mode
|
| 1373 |
+
if supports_toggle and enable_reasoning:
|
| 1374 |
+
# Hybrid model with reasoning enabled
|
| 1375 |
+
reasoning_instruction_en = """
|
| 1376 |
+
Use your reasoning capabilities to analyze the content before extracting.
|
| 1377 |
+
|
| 1378 |
+
Your reasoning should:
|
| 1379 |
+
1. Identify key decision points and action items
|
| 1380 |
+
2. Distinguish explicit decisions from general discussion
|
| 1381 |
+
3. Categorize information appropriately (action vs point vs question)
|
| 1382 |
+
|
| 1383 |
+
After reasoning, output ONLY valid JSON."""
|
| 1384 |
+
|
| 1385 |
+
reasoning_instruction_zh = """
|
| 1386 |
+
使用你的推理能力分析內容後再進行提取。
|
| 1387 |
+
|
| 1388 |
+
你的推理應該:
|
| 1389 |
+
1. 識別關鍵決策點和行動項目
|
| 1390 |
+
2. 區分明確決策與一般討論
|
| 1391 |
+
3. 適當分類資訊(行動 vs 要點 vs 問題)
|
| 1392 |
+
|
| 1393 |
+
推理後,僅輸出 JSON。"""
|
| 1394 |
+
else:
|
| 1395 |
+
reasoning_instruction_en = ""
|
| 1396 |
+
reasoning_instruction_zh = ""
|
| 1397 |
+
|
| 1398 |
+
# Build full prompt
|
| 1399 |
+
if output_language == "zh-TW":
|
| 1400 |
+
return f"""你是會議分析助手。從逐字稿中提取結構化資訊。
|
| 1401 |
+
{reasoning_instruction_zh}
|
| 1402 |
+
|
| 1403 |
+
僅輸出有效的 JSON,使用此精確架構:
|
| 1404 |
+
{{
|
| 1405 |
+
"action_items": ["包含負責人和截止日期的任務", ...],
|
| 1406 |
+
"decisions": ["包含理由的決策", ...],
|
| 1407 |
+
"key_points": ["重要討論要點", ...],
|
| 1408 |
+
"open_questions": ["未解決的問題或疑慮", ...]
|
| 1409 |
+
}}
|
| 1410 |
+
|
| 1411 |
+
規則:
|
| 1412 |
+
- 每個項目必須是完整、獨立的句子
|
| 1413 |
+
- 在每個項目中包含上下文(誰、什麼、何時)
|
| 1414 |
+
- 如果類別沒有項目,使用空陣列 []
|
| 1415 |
+
- 僅輸出 JSON,無 markdown,無解釋"""
|
| 1416 |
+
|
| 1417 |
+
else: # English
|
| 1418 |
+
return f"""You are a meeting analysis assistant. Extract structured information from transcript.
|
| 1419 |
+
{reasoning_instruction_en}
|
| 1420 |
+
|
| 1421 |
+
Output ONLY valid JSON with this exact schema:
|
| 1422 |
+
{{
|
| 1423 |
+
"action_items": ["Task with owner and deadline", ...],
|
| 1424 |
+
"decisions": ["Decision made with rationale", ...],
|
| 1425 |
+
"key_points": ["Important discussion point", ...],
|
| 1426 |
+
"open_questions": ["Unresolved question or concern", ...]
|
| 1427 |
+
}}
|
| 1428 |
+
|
| 1429 |
+
Rules:
|
| 1430 |
+
- Each item must be a complete, standalone sentence
|
| 1431 |
+
- Include context (who, what, when) in each item
|
| 1432 |
+
- If a category has no items, use empty array []
|
| 1433 |
+
- Output ONLY JSON, no markdown, no explanations"""
|
| 1434 |
+
|
| 1435 |
+
|
| 1436 |
+
def summarize_advanced(
|
| 1437 |
+
transcript: str,
|
| 1438 |
+
extraction_model_key: str,
|
| 1439 |
+
embedding_model_key: str,
|
| 1440 |
+
synthesis_model_key: str,
|
| 1441 |
+
extraction_n_ctx: int,
|
| 1442 |
+
overlap_turns: int,
|
| 1443 |
+
similarity_threshold: float,
|
| 1444 |
+
enable_extraction_reasoning: bool,
|
| 1445 |
+
enable_synthesis_reasoning: bool,
|
| 1446 |
+
output_language: str,
|
| 1447 |
+
max_tokens: int,
|
| 1448 |
+
enable_logging: bool,
|
| 1449 |
+
n_threads: int = 2
|
| 1450 |
+
) -> Generator[Dict[str, Any], None, None]:
|
| 1451 |
+
"""
|
| 1452 |
+
Advanced 3-stage pipeline: Extraction → Deduplication → Synthesis.
|
| 1453 |
+
|
| 1454 |
+
Yields progress updates as dicts with keys:
|
| 1455 |
+
- stage: "extraction" | "deduplication" | "synthesis" | "complete" | "error"
|
| 1456 |
+
- ticker: Progress ticker text (for extraction)
|
| 1457 |
+
- thinking: Thinking/reasoning content
|
| 1458 |
+
- summary: Final summary (for synthesis/complete)
|
| 1459 |
+
- error: Error message (if any)
|
| 1460 |
+
- trace_stats: Summary statistics (on complete)
|
| 1461 |
+
"""
|
| 1462 |
+
from meeting_summarizer.trace import Tracer
|
| 1463 |
+
from meeting_summarizer.extraction import (
|
| 1464 |
+
NativeTokenizer, EmbeddingModel, Window,
|
| 1465 |
+
stream_extract_from_window, deduplicate_items, stream_synthesize_executive_summary
|
| 1466 |
+
)
|
| 1467 |
+
|
| 1468 |
+
# Initialize tracer
|
| 1469 |
+
tracer = Tracer(enabled=enable_logging)
|
| 1470 |
+
tokenizer = NativeTokenizer()
|
| 1471 |
+
|
| 1472 |
+
extraction_llm = None
|
| 1473 |
+
embedding_model = None
|
| 1474 |
+
synthesis_llm = None
|
| 1475 |
+
|
| 1476 |
+
try:
|
| 1477 |
+
# ===== STAGE 1: EXTRACTION =====
|
| 1478 |
+
yield {"stage": "extraction", "ticker": "Loading extraction model...", "thinking": "", "summary": ""}
|
| 1479 |
+
|
| 1480 |
+
extraction_llm, load_msg = load_model_for_role(
|
| 1481 |
+
model_key=extraction_model_key,
|
| 1482 |
+
model_role="extraction",
|
| 1483 |
+
n_threads=n_threads,
|
| 1484 |
+
user_n_ctx=extraction_n_ctx
|
| 1485 |
+
)
|
| 1486 |
+
|
| 1487 |
+
yield {"stage": "extraction", "ticker": load_msg, "thinking": "", "summary": ""}
|
| 1488 |
+
|
| 1489 |
+
# Create windows from transcript (simple split by turns for now)
|
| 1490 |
+
# In production, this would be more sophisticated
|
| 1491 |
+
lines = [l.strip() for l in transcript.split('\n') if l.strip()]
|
| 1492 |
+
|
| 1493 |
+
# Simple windowing: split into chunks based on token count
|
| 1494 |
+
windows = []
|
| 1495 |
+
current_window = []
|
| 1496 |
+
current_tokens = 0
|
| 1497 |
+
window_id = 1
|
| 1498 |
+
|
| 1499 |
+
for line_num, line in enumerate(lines):
|
| 1500 |
+
line_tokens = tokenizer.count(line)
|
| 1501 |
+
|
| 1502 |
+
if current_tokens + line_tokens > extraction_n_ctx and current_window:
|
| 1503 |
+
# Create window
|
| 1504 |
+
window_content = '\n'.join(current_window)
|
| 1505 |
+
windows.append(Window(
|
| 1506 |
+
id=window_id,
|
| 1507 |
+
content=window_content,
|
| 1508 |
+
start_turn=line_num - len(current_window),
|
| 1509 |
+
end_turn=line_num - 1,
|
| 1510 |
+
token_count=current_tokens
|
| 1511 |
+
))
|
| 1512 |
+
window_id += 1
|
| 1513 |
+
|
| 1514 |
+
# Start new window with overlap
|
| 1515 |
+
overlap_lines = current_window[-overlap_turns:] if len(current_window) >= overlap_turns else current_window
|
| 1516 |
+
current_window = overlap_lines + [line]
|
| 1517 |
+
current_tokens = sum(tokenizer.count(l) for l in current_window)
|
| 1518 |
+
else:
|
| 1519 |
+
current_window.append(line)
|
| 1520 |
+
current_tokens += line_tokens
|
| 1521 |
+
|
| 1522 |
+
# Add final window
|
| 1523 |
+
if current_window:
|
| 1524 |
+
window_content = '\n'.join(current_window)
|
| 1525 |
+
windows.append(Window(
|
| 1526 |
+
id=window_id,
|
| 1527 |
+
content=window_content,
|
| 1528 |
+
start_turn=len(lines) - len(current_window),
|
| 1529 |
+
end_turn=len(lines) - 1,
|
| 1530 |
+
token_count=current_tokens
|
| 1531 |
+
))
|
| 1532 |
+
|
| 1533 |
+
total_windows = len(windows)
|
| 1534 |
+
yield {"stage": "extraction", "ticker": f"Created {total_windows} windows", "thinking": "", "summary": ""}
|
| 1535 |
+
|
| 1536 |
+
# Extract from each window
|
| 1537 |
+
all_items = {"action_items": [], "decisions": [], "key_points": [], "open_questions": []}
|
| 1538 |
+
|
| 1539 |
+
extraction_config = get_model_config(extraction_model_key, "extraction")
|
| 1540 |
+
|
| 1541 |
+
for window in windows:
|
| 1542 |
+
for ticker, thinking, partial_items, is_complete in stream_extract_from_window(
|
| 1543 |
+
extraction_llm=extraction_llm,
|
| 1544 |
+
window=window,
|
| 1545 |
+
window_id=window.id,
|
| 1546 |
+
total_windows=total_windows,
|
| 1547 |
+
tracer=tracer,
|
| 1548 |
+
tokenizer=tokenizer,
|
| 1549 |
+
model_config=extraction_config,
|
| 1550 |
+
enable_reasoning=enable_extraction_reasoning
|
| 1551 |
+
):
|
| 1552 |
+
yield {"stage": "extraction", "ticker": ticker, "thinking": thinking, "summary": ""}
|
| 1553 |
+
|
| 1554 |
+
if is_complete:
|
| 1555 |
+
# Merge items
|
| 1556 |
+
for category, items in partial_items.items():
|
| 1557 |
+
all_items[category].extend(items)
|
| 1558 |
+
|
| 1559 |
+
# Unload extraction model
|
| 1560 |
+
unload_model(extraction_llm, "extraction model")
|
| 1561 |
+
extraction_llm = None
|
| 1562 |
+
|
| 1563 |
+
total_extracted = sum(len(v) for v in all_items.values())
|
| 1564 |
+
yield {"stage": "extraction", "ticker": f"✅ Extracted {total_extracted} total items", "thinking": "", "summary": ""}
|
| 1565 |
+
|
| 1566 |
+
# ===== STAGE 2: DEDUPLICATION =====
|
| 1567 |
+
yield {"stage": "deduplication", "ticker": "Loading embedding model...", "thinking": "", "summary": ""}
|
| 1568 |
+
|
| 1569 |
+
embedding_model = EmbeddingModel(embedding_model_key, n_threads=n_threads)
|
| 1570 |
+
load_msg = embedding_model.load()
|
| 1571 |
+
|
| 1572 |
+
yield {"stage": "deduplication", "ticker": load_msg, "thinking": "", "summary": ""}
|
| 1573 |
+
|
| 1574 |
+
# Deduplicate
|
| 1575 |
+
deduplicated_items = deduplicate_items(
|
| 1576 |
+
all_items=all_items,
|
| 1577 |
+
embedding_model=embedding_model,
|
| 1578 |
+
similarity_threshold=similarity_threshold,
|
| 1579 |
+
tracer=tracer
|
| 1580 |
+
)
|
| 1581 |
+
|
| 1582 |
+
# Unload embedding model
|
| 1583 |
+
embedding_model.unload()
|
| 1584 |
+
embedding_model = None
|
| 1585 |
+
|
| 1586 |
+
total_deduplicated = sum(len(v) for v in deduplicated_items.values())
|
| 1587 |
+
duplicates_removed = total_extracted - total_deduplicated
|
| 1588 |
+
|
| 1589 |
+
yield {
|
| 1590 |
+
"stage": "deduplication",
|
| 1591 |
+
"ticker": f"✅ Deduplication complete: {total_extracted} → {total_deduplicated} ({duplicates_removed} duplicates removed)",
|
| 1592 |
+
"thinking": "",
|
| 1593 |
+
"summary": ""
|
| 1594 |
+
}
|
| 1595 |
+
|
| 1596 |
+
# ===== STAGE 3: SYNTHESIS =====
|
| 1597 |
+
yield {"stage": "synthesis", "ticker": "", "thinking": "", "summary": "Loading synthesis model..."}
|
| 1598 |
+
|
| 1599 |
+
synthesis_llm, load_msg = load_model_for_role(
|
| 1600 |
+
model_key=synthesis_model_key,
|
| 1601 |
+
model_role="synthesis",
|
| 1602 |
+
n_threads=n_threads
|
| 1603 |
+
)
|
| 1604 |
+
|
| 1605 |
+
yield {"stage": "synthesis", "ticker": "", "thinking": "", "summary": load_msg}
|
| 1606 |
+
|
| 1607 |
+
# Synthesize
|
| 1608 |
+
synthesis_config = get_model_config(synthesis_model_key, "synthesis")
|
| 1609 |
+
final_summary = ""
|
| 1610 |
+
final_thinking = ""
|
| 1611 |
+
|
| 1612 |
+
for summary_chunk, thinking_chunk, is_complete in stream_synthesize_executive_summary(
|
| 1613 |
+
synthesis_llm=synthesis_llm,
|
| 1614 |
+
deduplicated_items=deduplicated_items,
|
| 1615 |
+
model_config=synthesis_config,
|
| 1616 |
+
output_language=output_language,
|
| 1617 |
+
enable_reasoning=enable_synthesis_reasoning,
|
| 1618 |
+
max_tokens=max_tokens,
|
| 1619 |
+
tracer=tracer
|
| 1620 |
+
):
|
| 1621 |
+
final_summary = summary_chunk
|
| 1622 |
+
final_thinking = thinking_chunk
|
| 1623 |
+
yield {"stage": "synthesis", "ticker": "", "thinking": thinking_chunk, "summary": summary_chunk}
|
| 1624 |
+
|
| 1625 |
+
# Unload synthesis model
|
| 1626 |
+
unload_model(synthesis_llm, "synthesis model")
|
| 1627 |
+
synthesis_llm = None
|
| 1628 |
+
|
| 1629 |
+
# Apply Chinese conversion if needed
|
| 1630 |
+
if output_language == "zh-TW":
|
| 1631 |
+
converter = OpenCC('s2twp')
|
| 1632 |
+
final_summary = converter.convert(final_summary)
|
| 1633 |
+
if final_thinking:
|
| 1634 |
+
final_thinking = converter.convert(final_thinking)
|
| 1635 |
+
|
| 1636 |
+
# Get trace stats
|
| 1637 |
+
trace_stats = tracer.get_summary_stats()
|
| 1638 |
+
|
| 1639 |
+
yield {
|
| 1640 |
+
"stage": "complete",
|
| 1641 |
+
"ticker": "",
|
| 1642 |
+
"thinking": final_thinking,
|
| 1643 |
+
"summary": final_summary,
|
| 1644 |
+
"trace_stats": trace_stats,
|
| 1645 |
+
"trace_json": tracer.get_trace_json()
|
| 1646 |
+
}
|
| 1647 |
+
|
| 1648 |
+
except Exception as e:
|
| 1649 |
+
logger.error(f"Advanced pipeline error: {e}", exc_info=True)
|
| 1650 |
+
|
| 1651 |
+
# Cleanup
|
| 1652 |
+
if extraction_llm:
|
| 1653 |
+
unload_model(extraction_llm, "extraction model")
|
| 1654 |
+
if embedding_model:
|
| 1655 |
+
embedding_model.unload()
|
| 1656 |
+
if synthesis_llm:
|
| 1657 |
+
unload_model(synthesis_llm, "synthesis model")
|
| 1658 |
+
|
| 1659 |
+
yield {
|
| 1660 |
+
"stage": "error",
|
| 1661 |
+
"ticker": "",
|
| 1662 |
+
"thinking": "",
|
| 1663 |
+
"summary": "",
|
| 1664 |
+
"error": str(e)
|
| 1665 |
+
}
|
| 1666 |
+
|
| 1667 |
+
|
| 1668 |
def download_summary_json(summary, thinking, model_key, language, metrics):
|
| 1669 |
"""Generate JSON file with summary and metadata."""
|
| 1670 |
import json
|
|
|
|
| 2566 |
)
|
| 2567 |
|
| 2568 |
# ==========================================
|
| 2569 |
+
# Section 2: Mode Selection (Standard vs Advanced)
|
| 2570 |
# ==========================================
|
| 2571 |
+
with gr.Tabs() as mode_tabs:
|
| 2572 |
|
| 2573 |
+
# ===== STANDARD MODE =====
|
| 2574 |
+
with gr.TabItem("📊 Standard Mode"):
|
| 2575 |
+
gr.HTML('<div style="font-size: 0.9em; color: #64748b; margin-bottom: 10px;">Single-model direct summarization</div>')
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2576 |
|
| 2577 |
+
with gr.Tabs() as model_tabs:
|
| 2578 |
+
|
| 2579 |
+
# --- Tab 1: Preset Models ---
|
| 2580 |
+
with gr.TabItem("🤖 Preset Models"):
|
| 2581 |
+
# Filter out custom_hf from preset choices
|
| 2582 |
+
preset_choices = [
|
| 2583 |
+
(info["name"] + (" ⚡" if info.get("supports_reasoning", False) and not info.get("supports_toggle", False) else ""), key)
|
| 2584 |
+
for key, info in AVAILABLE_MODELS.items()
|
| 2585 |
+
if key != "custom_hf"
|
| 2586 |
+
]
|
| 2587 |
+
|
| 2588 |
+
model_dropdown = gr.Dropdown(
|
| 2589 |
+
choices=preset_choices,
|
| 2590 |
+
value=DEFAULT_MODEL_KEY,
|
| 2591 |
+
label="Select Model",
|
| 2592 |
+
info="Smaller = faster. ⚡ = Always-reasoning models."
|
| 2593 |
+
)
|
| 2594 |
+
|
| 2595 |
+
enable_reasoning = gr.Checkbox(
|
| 2596 |
+
value=True,
|
| 2597 |
+
label="Enable Reasoning Mode",
|
| 2598 |
+
info="Uses /think for deeper analysis (slower) or /no_think for direct output (faster).",
|
| 2599 |
+
interactive=True,
|
| 2600 |
+
visible=AVAILABLE_MODELS[DEFAULT_MODEL_KEY].get("supports_toggle", False)
|
| 2601 |
+
)
|
| 2602 |
+
|
| 2603 |
+
# Model info for preset models
|
| 2604 |
+
gr.HTML('<div class="section-header" style="margin-top: 12px;"><span class="section-icon">📊</span> Model Information</div>')
|
| 2605 |
+
_default_threads = DEFAULT_CUSTOM_THREADS if DEFAULT_CUSTOM_THREADS > 0 else 2
|
| 2606 |
+
info_output = gr.Markdown(
|
| 2607 |
+
value=get_model_info(DEFAULT_MODEL_KEY, n_threads=_default_threads)[0],
|
| 2608 |
+
elem_classes=["stats-grid"]
|
| 2609 |
+
)
|
| 2610 |
+
|
| 2611 |
+
# --- Tab 2: Custom GGUF ---
|
| 2612 |
+
with gr.TabItem("🔧 Custom GGUF"):
|
| 2613 |
+
gr.HTML('<div style="font-size: 0.85em; color: #64748b; margin-bottom: 10px;">Load any GGUF model from HuggingFace Hub</div>')
|
| 2614 |
+
|
| 2615 |
+
# HF Hub Search Component
|
| 2616 |
+
model_search_input = HuggingfaceHubSearch(
|
| 2617 |
+
label="🔍 Search HuggingFace Models",
|
| 2618 |
+
placeholder="Type model name (e.g., 'qwen', 'phi', 'llama')",
|
| 2619 |
+
search_type="model",
|
| 2620 |
+
)
|
| 2621 |
+
|
| 2622 |
+
# File dropdown (populated after repo discovery)
|
| 2623 |
+
custom_file_dropdown = gr.Dropdown(
|
| 2624 |
+
label="📦 Select GGUF File",
|
| 2625 |
+
choices=[],
|
| 2626 |
+
value=None,
|
| 2627 |
+
info="GGUF files appear after selecting a model above",
|
| 2628 |
+
interactive=True,
|
| 2629 |
+
)
|
| 2630 |
+
|
| 2631 |
+
# Load button
|
| 2632 |
+
load_btn = gr.Button("⬇️ Load Selected Model", variant="primary", size="sm")
|
| 2633 |
+
|
| 2634 |
+
# Status message
|
| 2635 |
+
custom_status = gr.Textbox(
|
| 2636 |
+
label="Status",
|
| 2637 |
+
interactive=False,
|
| 2638 |
+
value="",
|
| 2639 |
+
visible=False,
|
| 2640 |
+
)
|
| 2641 |
+
|
| 2642 |
+
retry_btn = gr.Button("🔄 Retry", variant="secondary", visible=False)
|
| 2643 |
+
|
| 2644 |
+
# Model info for custom models (shows after loading)
|
| 2645 |
+
gr.HTML('<div class="section-header" style="margin-top: 12px;"><span class="section-icon">📊</span> Custom Model Info</div>')
|
| 2646 |
+
custom_info_output = gr.Markdown(
|
| 2647 |
+
value="*Load a model to see its specifications...*",
|
| 2648 |
+
elem_classes=["stats-grid"]
|
| 2649 |
+
)
|
| 2650 |
|
| 2651 |
+
# ===== ADVANCED MODE =====
|
| 2652 |
+
with gr.TabItem("🧠 Advanced Mode (3-Model Pipeline)"):
|
| 2653 |
+
gr.HTML('<div style="font-size: 0.9em; color: #64748b; margin-bottom: 10px;">Extraction → Deduplication → Synthesis</div>')
|
| 2654 |
|
| 2655 |
+
# Model Selection Row
|
| 2656 |
+
gr.HTML('<div class="section-header"><span class="section-icon">🤖</span> Model Selection</div>')
|
| 2657 |
+
with gr.Row():
|
| 2658 |
+
extraction_model = gr.Dropdown(
|
| 2659 |
+
choices=[(EXTRACTION_MODELS[k]["name"], k) for k in EXTRACTION_MODELS.keys()],
|
| 2660 |
+
value=DEFAULT_EXTRACTION_MODEL,
|
| 2661 |
+
label="🔍 Stage 1: Extraction Model (≤1.7B)",
|
| 2662 |
+
info="Extracts structured items from windows"
|
| 2663 |
+
)
|
| 2664 |
+
|
| 2665 |
+
embedding_model = gr.Dropdown(
|
| 2666 |
+
choices=[("granite-107m", "granite-107m"), ("granite-278m", "granite-278m"),
|
| 2667 |
+
("gemma-300m", "gemma-300m"), ("qwen-600m", "qwen-600m")],
|
| 2668 |
+
value="granite-107m",
|
| 2669 |
+
label="🧬 Stage 2: Embedding Model",
|
| 2670 |
+
info="Deduplication via semantic similarity"
|
| 2671 |
+
)
|
| 2672 |
+
|
| 2673 |
+
synthesis_model = gr.Dropdown(
|
| 2674 |
+
choices=[(SYNTHESIS_MODELS[k]["name"], k) for k in SYNTHESIS_MODELS.keys()],
|
| 2675 |
+
value=DEFAULT_SYNTHESIS_MODEL,
|
| 2676 |
+
label="✨ Stage 3: Synthesis Model (1B-30B)",
|
| 2677 |
+
info="Generates executive summary"
|
| 2678 |
+
)
|
| 2679 |
|
| 2680 |
+
# Extraction Parameters Row
|
| 2681 |
+
gr.HTML('<div class="section-header" style="margin-top: 12px;"><span class="section-icon">⚙️</span> Extraction Parameters</div>')
|
| 2682 |
+
with gr.Row():
|
| 2683 |
+
extraction_n_ctx = gr.Slider(
|
| 2684 |
+
minimum=2048,
|
| 2685 |
+
maximum=8192,
|
| 2686 |
+
step=1024,
|
| 2687 |
+
value=4096,
|
| 2688 |
+
label="🪟 Extraction Context Window (n_ctx)",
|
| 2689 |
+
info="Smaller = more windows, Larger = fewer windows"
|
| 2690 |
+
)
|
| 2691 |
+
|
| 2692 |
+
overlap_turns = gr.Slider(
|
| 2693 |
+
minimum=1,
|
| 2694 |
+
maximum=5,
|
| 2695 |
+
step=1,
|
| 2696 |
+
value=2,
|
| 2697 |
+
label="🔄 Window Overlap (turns)",
|
| 2698 |
+
info="Speaker turns shared between windows"
|
| 2699 |
+
)
|
| 2700 |
|
| 2701 |
+
# Deduplication Parameters Row
|
| 2702 |
+
with gr.Row():
|
| 2703 |
+
similarity_threshold = gr.Slider(
|
| 2704 |
+
minimum=0.70,
|
| 2705 |
+
maximum=0.95,
|
| 2706 |
+
step=0.01,
|
| 2707 |
+
value=0.85,
|
| 2708 |
+
label="🎯 Deduplication Similarity Threshold",
|
| 2709 |
+
info="Higher = stricter duplicate detection"
|
| 2710 |
+
)
|
| 2711 |
|
| 2712 |
+
# Reasoning Controls (Separate checkboxes)
|
| 2713 |
+
gr.HTML('<div class="section-header" style="margin-top: 12px;"><span class="section-icon">🧠</span> Reasoning Configuration</div>')
|
| 2714 |
+
with gr.Row():
|
| 2715 |
+
enable_extraction_reasoning = gr.Checkbox(
|
| 2716 |
+
value=False,
|
| 2717 |
+
visible=False,
|
| 2718 |
+
label="🧠 Enable Reasoning for Extraction",
|
| 2719 |
+
info="Thinking before JSON (Qwen3 hybrid models only)"
|
| 2720 |
+
)
|
| 2721 |
+
|
| 2722 |
+
enable_synthesis_reasoning = gr.Checkbox(
|
| 2723 |
+
value=True,
|
| 2724 |
+
visible=True,
|
| 2725 |
+
label="🧠 Enable Reasoning for Synthesis",
|
| 2726 |
+
info="Thinking for final summary generation"
|
| 2727 |
+
)
|
| 2728 |
|
| 2729 |
+
# Output Settings Row
|
| 2730 |
+
gr.HTML('<div class="section-header" style="margin-top: 12px;"><span class="section-icon">🌐</span> Output Settings</div>')
|
| 2731 |
+
with gr.Row():
|
| 2732 |
+
adv_output_language = gr.Radio(
|
| 2733 |
+
choices=["en", "zh-TW"],
|
| 2734 |
+
value="en",
|
| 2735 |
+
label="Output Language",
|
| 2736 |
+
info="Extraction auto-detects, synthesis uses this"
|
| 2737 |
+
)
|
| 2738 |
+
|
| 2739 |
+
adv_max_tokens = gr.Slider(
|
| 2740 |
+
minimum=512,
|
| 2741 |
+
maximum=4096,
|
| 2742 |
+
step=128,
|
| 2743 |
+
value=2048,
|
| 2744 |
+
label="📏 Max Synthesis Tokens",
|
| 2745 |
+
info="Maximum tokens for final summary"
|
| 2746 |
+
)
|
| 2747 |
|
| 2748 |
+
# Logging Control
|
| 2749 |
+
enable_detailed_logging = gr.Checkbox(
|
| 2750 |
+
value=True,
|
| 2751 |
+
label="📝 Enable Detailed Trace Logging",
|
| 2752 |
+
info="Save JSONL trace (embedded in download JSON)"
|
| 2753 |
)
|
| 2754 |
|
| 2755 |
# ==========================================
|
|
|
|
| 2982 |
outputs=[system_prompt_debug],
|
| 2983 |
)
|
| 2984 |
|
| 2985 |
+
# ===== ADVANCED MODE EVENT HANDLERS =====
|
| 2986 |
+
|
| 2987 |
+
# Update extraction reasoning checkbox visibility when extraction model changes
|
| 2988 |
+
def update_extraction_reasoning_visibility(model_key):
|
| 2989 |
+
"""Show/hide extraction reasoning checkbox based on model capabilities."""
|
| 2990 |
+
if model_key not in EXTRACTION_MODELS:
|
| 2991 |
+
return gr.update(visible=False, value=False)
|
| 2992 |
+
|
| 2993 |
+
config = EXTRACTION_MODELS[model_key]
|
| 2994 |
+
supports_toggle = config.get("supports_toggle", False)
|
| 2995 |
+
|
| 2996 |
+
if supports_toggle:
|
| 2997 |
+
# Hybrid model
|
| 2998 |
+
return gr.update(visible=True, value=False, interactive=True, label="🧠 Enable Reasoning for Extraction")
|
| 2999 |
+
elif config.get("supports_reasoning", False):
|
| 3000 |
+
# Thinking-only model (none currently in extraction)
|
| 3001 |
+
return gr.update(visible=True, value=True, interactive=False, label="🧠 Reasoning Mode (Always On)")
|
| 3002 |
+
else:
|
| 3003 |
+
# Non-reasoning model
|
| 3004 |
+
return gr.update(visible=False, value=False)
|
| 3005 |
+
|
| 3006 |
+
# Update synthesis reasoning checkbox visibility when synthesis model changes
|
| 3007 |
+
def update_synthesis_reasoning_visibility(model_key):
|
| 3008 |
+
"""Show/hide synthesis reasoning checkbox based on model capabilities."""
|
| 3009 |
+
if model_key not in SYNTHESIS_MODELS:
|
| 3010 |
+
return gr.update(visible=False, value=False)
|
| 3011 |
+
|
| 3012 |
+
config = SYNTHESIS_MODELS[model_key]
|
| 3013 |
+
supports_reasoning = config.get("supports_reasoning", False)
|
| 3014 |
+
supports_toggle = config.get("supports_toggle", False)
|
| 3015 |
+
|
| 3016 |
+
if not supports_reasoning:
|
| 3017 |
+
# Non-reasoning model
|
| 3018 |
+
return gr.update(visible=False, value=False)
|
| 3019 |
+
elif supports_reasoning and not supports_toggle:
|
| 3020 |
+
# Thinking-only model
|
| 3021 |
+
return gr.update(visible=True, value=True, interactive=False, label="⚡ Reasoning Mode (Always On)")
|
| 3022 |
+
else:
|
| 3023 |
+
# Hybrid model
|
| 3024 |
+
return gr.update(visible=True, value=True, interactive=True, label="🧠 Enable Reasoning for Synthesis")
|
| 3025 |
+
|
| 3026 |
+
# Wire up Advanced Mode event handlers
|
| 3027 |
+
extraction_model.change(
|
| 3028 |
+
fn=update_extraction_reasoning_visibility,
|
| 3029 |
+
inputs=[extraction_model],
|
| 3030 |
+
outputs=[enable_extraction_reasoning]
|
| 3031 |
+
)
|
| 3032 |
+
|
| 3033 |
+
synthesis_model.change(
|
| 3034 |
+
fn=update_synthesis_reasoning_visibility,
|
| 3035 |
+
inputs=[synthesis_model],
|
| 3036 |
+
outputs=[enable_synthesis_reasoning]
|
| 3037 |
+
)
|
| 3038 |
+
|
| 3039 |
# Debounced auto-discovery for custom repo ID (500ms delay)
|
| 3040 |
import time as time_module
|
| 3041 |
|
|
|
|
| 3201 |
outputs=[custom_info_output],
|
| 3202 |
)
|
| 3203 |
|
| 3204 |
+
# ===== SUBMIT BUTTON ROUTER =====
|
| 3205 |
+
# Routes to Standard or Advanced mode based on active tab
|
| 3206 |
+
|
| 3207 |
+
def route_summarize(
|
| 3208 |
+
# Standard mode inputs
|
| 3209 |
+
file_input_val, text_input_val, model_dropdown_val, enable_reasoning_val,
|
| 3210 |
+
max_tokens_val, temperature_val, top_p_val, top_k_val, language_val,
|
| 3211 |
+
thread_config_val, custom_threads_val, custom_model_val,
|
| 3212 |
+
# Advanced mode inputs
|
| 3213 |
+
extraction_model_val, embedding_model_val, synthesis_model_val,
|
| 3214 |
+
extraction_n_ctx_val, overlap_turns_val, similarity_threshold_val,
|
| 3215 |
+
enable_extraction_reasoning_val, enable_synthesis_reasoning_val,
|
| 3216 |
+
adv_output_language_val, adv_max_tokens_val, enable_logging_val,
|
| 3217 |
+
# Mode selector
|
| 3218 |
+
mode_tabs_val
|
| 3219 |
+
):
|
| 3220 |
+
"""Route to Standard or Advanced mode based on selected tab."""
|
| 3221 |
+
|
| 3222 |
+
# Determine active mode (Gradio returns index of active tab)
|
| 3223 |
+
# 0 = Standard Mode, 1 = Advanced Mode
|
| 3224 |
+
is_advanced_mode = (mode_tabs_val == 1)
|
| 3225 |
+
|
| 3226 |
+
if is_advanced_mode:
|
| 3227 |
+
# Advanced Mode: Use summarize_advanced()
|
| 3228 |
+
# Get n_threads
|
| 3229 |
+
thread_map = {"free": 2, "upgrade": 8, "custom": max(1, custom_threads_val)}
|
| 3230 |
+
n_threads = thread_map.get(thread_config_val, 2)
|
| 3231 |
+
|
| 3232 |
+
# Get transcript
|
| 3233 |
+
transcript = ""
|
| 3234 |
+
if file_input_val:
|
| 3235 |
+
with open(file_input_val, 'r', encoding='utf-8') as f:
|
| 3236 |
+
transcript = f.read()
|
| 3237 |
+
elif text_input_val:
|
| 3238 |
+
transcript = text_input_val
|
| 3239 |
+
else:
|
| 3240 |
+
yield ("", "⚠️ Please upload a file or paste text", "", {}, "")
|
| 3241 |
+
return
|
| 3242 |
+
|
| 3243 |
+
# Stream Advanced Mode pipeline
|
| 3244 |
+
for update in summarize_advanced(
|
| 3245 |
+
transcript=transcript,
|
| 3246 |
+
extraction_model_key=extraction_model_val,
|
| 3247 |
+
embedding_model_key=embedding_model_val,
|
| 3248 |
+
synthesis_model_key=synthesis_model_val,
|
| 3249 |
+
extraction_n_ctx=extraction_n_ctx_val,
|
| 3250 |
+
overlap_turns=overlap_turns_val,
|
| 3251 |
+
similarity_threshold=similarity_threshold_val,
|
| 3252 |
+
enable_extraction_reasoning=enable_extraction_reasoning_val,
|
| 3253 |
+
enable_synthesis_reasoning=enable_synthesis_reasoning_val,
|
| 3254 |
+
output_language=adv_output_language_val,
|
| 3255 |
+
max_tokens=adv_max_tokens_val,
|
| 3256 |
+
enable_logging=enable_logging_val,
|
| 3257 |
+
n_threads=n_threads
|
| 3258 |
+
):
|
| 3259 |
+
stage = update.get("stage", "")
|
| 3260 |
+
|
| 3261 |
+
if stage == "extraction":
|
| 3262 |
+
ticker = update.get("ticker", "")
|
| 3263 |
+
thinking = update.get("thinking", "")
|
| 3264 |
+
yield (thinking, ticker, "", {}, "")
|
| 3265 |
+
|
| 3266 |
+
elif stage == "deduplication":
|
| 3267 |
+
ticker = update.get("ticker", "")
|
| 3268 |
+
yield ("", ticker, "", {}, "")
|
| 3269 |
+
|
| 3270 |
+
elif stage == "synthesis":
|
| 3271 |
+
thinking = update.get("thinking", "")
|
| 3272 |
+
summary = update.get("summary", "")
|
| 3273 |
+
yield (thinking, summary, "", {}, "")
|
| 3274 |
+
|
| 3275 |
+
elif stage == "complete":
|
| 3276 |
+
thinking = update.get("thinking", "")
|
| 3277 |
+
summary = update.get("summary", "")
|
| 3278 |
+
trace_stats = update.get("trace_stats", {})
|
| 3279 |
+
|
| 3280 |
+
# Format info message
|
| 3281 |
+
info_msg = f"""**Advanced Mode Complete**
|
| 3282 |
+
- Total Windows: {trace_stats.get('total_windows', 0)}
|
| 3283 |
+
- Items Extracted: {trace_stats.get('total_items_extracted', 0)}
|
| 3284 |
+
- Items After Dedup: {trace_stats.get('total_items_after_dedup', 0)}
|
| 3285 |
+
- Duplicates Removed: {trace_stats.get('total_duplicates_removed', 0)}
|
| 3286 |
+
- Total Time: {trace_stats.get('total_elapsed_seconds', 0):.1f}s"""
|
| 3287 |
+
|
| 3288 |
+
# Store trace for download
|
| 3289 |
+
metrics = {
|
| 3290 |
+
"mode": "advanced",
|
| 3291 |
+
"trace_stats": trace_stats,
|
| 3292 |
+
"trace_json": update.get("trace_json", [])
|
| 3293 |
+
}
|
| 3294 |
+
|
| 3295 |
+
yield (thinking, summary, info_msg, metrics, "Advanced Mode (3-Model Pipeline)")
|
| 3296 |
+
|
| 3297 |
+
elif stage == "error":
|
| 3298 |
+
error = update.get("error", "Unknown error")
|
| 3299 |
+
yield ("", f"❌ Error: {error}", "", {}, "")
|
| 3300 |
+
return
|
| 3301 |
+
|
| 3302 |
+
else:
|
| 3303 |
+
# Standard Mode: Use existing summarize_streaming()
|
| 3304 |
+
for thinking, summary, info, metrics, system_prompt in summarize_streaming(
|
| 3305 |
+
file_input_val, text_input_val, model_dropdown_val, enable_reasoning_val,
|
| 3306 |
+
max_tokens_val, temperature_val, top_p_val, top_k_val, language_val,
|
| 3307 |
+
thread_config_val, custom_threads_val, custom_model_val
|
| 3308 |
+
):
|
| 3309 |
+
yield (thinking, summary, info, metrics, system_prompt)
|
| 3310 |
+
|
| 3311 |
+
# Wire up submit button with router
|
| 3312 |
submit_btn.click(
|
| 3313 |
+
fn=route_summarize,
|
| 3314 |
+
inputs=[
|
| 3315 |
+
# Standard mode inputs
|
| 3316 |
+
file_input, text_input, model_dropdown, enable_reasoning,
|
| 3317 |
+
max_tokens, temperature_slider, top_p, top_k, language_selector,
|
| 3318 |
+
thread_config_dropdown, custom_threads_slider, custom_model_state,
|
| 3319 |
+
# Advanced mode inputs
|
| 3320 |
+
extraction_model, embedding_model, synthesis_model,
|
| 3321 |
+
extraction_n_ctx, overlap_turns, similarity_threshold,
|
| 3322 |
+
enable_extraction_reasoning, enable_synthesis_reasoning,
|
| 3323 |
+
adv_output_language, adv_max_tokens, enable_detailed_logging,
|
| 3324 |
+
# Mode selector
|
| 3325 |
+
mode_tabs
|
| 3326 |
+
],
|
| 3327 |
outputs=[thinking_output, summary_output, info_output, metrics_state, system_prompt_debug],
|
| 3328 |
show_progress="full"
|
| 3329 |
)
|
|
@@ -0,0 +1,13 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Tiny Scribe - Meeting Summarizer Module
|
| 3 |
+
|
| 4 |
+
This module provides advanced 3-stage meeting summarization:
|
| 5 |
+
1. Extraction: Extract structured items from transcript windows
|
| 6 |
+
2. Deduplication: Remove semantic duplicates using embeddings
|
| 7 |
+
3. Synthesis: Generate executive summary from deduplicated items
|
| 8 |
+
"""
|
| 9 |
+
|
| 10 |
+
__version__ = "1.0.0"
|
| 11 |
+
|
| 12 |
+
# Package exports will be added as we implement components
|
| 13 |
+
__all__ = []
|
|
@@ -0,0 +1,705 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Advanced Extraction Pipeline
|
| 3 |
+
|
| 4 |
+
Provides:
|
| 5 |
+
1. EMBEDDING_MODELS registry (4 models for deduplication)
|
| 6 |
+
2. NativeTokenizer - Count tokens without llama.cpp
|
| 7 |
+
3. EmbeddingModel - Load/compute embeddings
|
| 8 |
+
4. format_progress_ticker - Live UI updates
|
| 9 |
+
5. stream_extract_from_window - Stage 1: Extraction
|
| 10 |
+
6. deduplicate_items - Stage 2: Deduplication
|
| 11 |
+
7. stream_synthesize_executive_summary - Stage 3: Synthesis
|
| 12 |
+
"""
|
| 13 |
+
|
| 14 |
+
import re
|
| 15 |
+
import json
|
| 16 |
+
import time
|
| 17 |
+
import logging
|
| 18 |
+
from typing import Dict, List, Any, Tuple, Generator, Optional
|
| 19 |
+
from dataclasses import dataclass
|
| 20 |
+
import numpy as np
|
| 21 |
+
from llama_cpp import Llama
|
| 22 |
+
|
| 23 |
+
logger = logging.getLogger(__name__)
|
| 24 |
+
|
| 25 |
+
|
| 26 |
+
# ===== EMBEDDING MODELS REGISTRY =====
|
| 27 |
+
|
| 28 |
+
EMBEDDING_MODELS = {
|
| 29 |
+
"granite-107m": {
|
| 30 |
+
"name": "Granite 107M Multilingual (384-dim)",
|
| 31 |
+
"repo_id": "ibm-granite/granite-embedding-107m-multilingual",
|
| 32 |
+
"filename": "*Q8_0.gguf",
|
| 33 |
+
"embedding_dim": 384,
|
| 34 |
+
"max_context": 2048,
|
| 35 |
+
"description": "Fastest, multilingual, good for quick deduplication",
|
| 36 |
+
},
|
| 37 |
+
"granite-278m": {
|
| 38 |
+
"name": "Granite 278M Multilingual (768-dim)",
|
| 39 |
+
"repo_id": "ibm-granite/granite-embedding-278m-multilingual",
|
| 40 |
+
"filename": "*Q8_0.gguf",
|
| 41 |
+
"embedding_dim": 768,
|
| 42 |
+
"max_context": 2048,
|
| 43 |
+
"description": "Balanced speed/quality, multilingual",
|
| 44 |
+
},
|
| 45 |
+
"gemma-300m": {
|
| 46 |
+
"name": "Embedding Gemma 300M (768-dim)",
|
| 47 |
+
"repo_id": "unsloth/embeddinggemma-300m-GGUF",
|
| 48 |
+
"filename": "*Q8_0.gguf",
|
| 49 |
+
"embedding_dim": 768,
|
| 50 |
+
"max_context": 2048,
|
| 51 |
+
"description": "Google embedding model, strong semantics",
|
| 52 |
+
},
|
| 53 |
+
"qwen-600m": {
|
| 54 |
+
"name": "Qwen3 Embedding 600M (1024-dim)",
|
| 55 |
+
"repo_id": "Qwen/Qwen3-Embedding-0.6B-GGUF",
|
| 56 |
+
"filename": "*Q8_0.gguf",
|
| 57 |
+
"embedding_dim": 1024,
|
| 58 |
+
"max_context": 2048,
|
| 59 |
+
"description": "Highest quality, best for critical dedup",
|
| 60 |
+
},
|
| 61 |
+
}
|
| 62 |
+
|
| 63 |
+
|
| 64 |
+
# ===== NATIVE TOKENIZER =====
|
| 65 |
+
|
| 66 |
+
class NativeTokenizer:
|
| 67 |
+
"""
|
| 68 |
+
Simple tokenizer for counting tokens without llama.cpp.
|
| 69 |
+
Uses GPT-2 style approximation: ~1 token per 4 characters.
|
| 70 |
+
"""
|
| 71 |
+
|
| 72 |
+
def __init__(self):
|
| 73 |
+
"""Initialize tokenizer."""
|
| 74 |
+
self.chars_per_token = 4 # Conservative estimate
|
| 75 |
+
|
| 76 |
+
def count(self, text: str) -> int:
|
| 77 |
+
"""
|
| 78 |
+
Count tokens in text.
|
| 79 |
+
|
| 80 |
+
Args:
|
| 81 |
+
text: Input text
|
| 82 |
+
|
| 83 |
+
Returns:
|
| 84 |
+
Approximate token count
|
| 85 |
+
"""
|
| 86 |
+
if not text:
|
| 87 |
+
return 0
|
| 88 |
+
|
| 89 |
+
# Simple heuristic: 1 token ≈ 4 characters for English
|
| 90 |
+
# Adjust for CJK characters (Chinese/Japanese/Korean)
|
| 91 |
+
cjk_chars = len(re.findall(r'[\u4e00-\u9fff\u3040-\u309f\u30a0-\u30ff]', text))
|
| 92 |
+
non_cjk_chars = len(text) - cjk_chars
|
| 93 |
+
|
| 94 |
+
# CJK: 1 char ≈ 1 token, Non-CJK: 4 chars ≈ 1 token
|
| 95 |
+
tokens = cjk_chars + (non_cjk_chars // self.chars_per_token)
|
| 96 |
+
|
| 97 |
+
return max(1, tokens) # Minimum 1 token
|
| 98 |
+
|
| 99 |
+
|
| 100 |
+
# ===== EMBEDDING MODEL =====
|
| 101 |
+
|
| 102 |
+
class EmbeddingModel:
|
| 103 |
+
"""Wrapper for embedding models used in deduplication."""
|
| 104 |
+
|
| 105 |
+
def __init__(self, model_key: str, n_threads: int = 2):
|
| 106 |
+
"""
|
| 107 |
+
Initialize embedding model.
|
| 108 |
+
|
| 109 |
+
Args:
|
| 110 |
+
model_key: Key from EMBEDDING_MODELS registry
|
| 111 |
+
n_threads: CPU threads for inference
|
| 112 |
+
"""
|
| 113 |
+
if model_key not in EMBEDDING_MODELS:
|
| 114 |
+
raise ValueError(f"Unknown embedding model: {model_key}")
|
| 115 |
+
|
| 116 |
+
self.model_key = model_key
|
| 117 |
+
self.config = EMBEDDING_MODELS[model_key]
|
| 118 |
+
self.n_threads = n_threads
|
| 119 |
+
self.llm: Optional[Llama] = None
|
| 120 |
+
|
| 121 |
+
def load(self) -> str:
|
| 122 |
+
"""
|
| 123 |
+
Load embedding model.
|
| 124 |
+
|
| 125 |
+
Returns:
|
| 126 |
+
Info message
|
| 127 |
+
"""
|
| 128 |
+
logger.info(f"Loading embedding model: {self.config['name']}")
|
| 129 |
+
|
| 130 |
+
try:
|
| 131 |
+
self.llm = Llama.from_pretrained(
|
| 132 |
+
repo_id=self.config["repo_id"],
|
| 133 |
+
filename=self.config["filename"],
|
| 134 |
+
n_ctx=self.config["max_context"],
|
| 135 |
+
n_batch=512,
|
| 136 |
+
n_threads=self.n_threads,
|
| 137 |
+
n_threads_batch=self.n_threads,
|
| 138 |
+
n_gpu_layers=0, # CPU only for embeddings
|
| 139 |
+
verbose=False,
|
| 140 |
+
embedding=True, # Enable embedding mode
|
| 141 |
+
)
|
| 142 |
+
|
| 143 |
+
msg = f"✅ Loaded: {self.config['name']} ({self.config['embedding_dim']}-dim)"
|
| 144 |
+
logger.info(msg)
|
| 145 |
+
return msg
|
| 146 |
+
|
| 147 |
+
except Exception as e:
|
| 148 |
+
error_msg = f"❌ Failed to load {self.model_key}: {str(e)}"
|
| 149 |
+
logger.error(error_msg, exc_info=True)
|
| 150 |
+
raise Exception(error_msg)
|
| 151 |
+
|
| 152 |
+
def embed(self, text: str) -> np.ndarray:
|
| 153 |
+
"""
|
| 154 |
+
Compute embedding for text.
|
| 155 |
+
|
| 156 |
+
Args:
|
| 157 |
+
text: Input text
|
| 158 |
+
|
| 159 |
+
Returns:
|
| 160 |
+
Embedding vector (numpy array)
|
| 161 |
+
"""
|
| 162 |
+
if self.llm is None:
|
| 163 |
+
raise RuntimeError("Model not loaded. Call load() first.")
|
| 164 |
+
|
| 165 |
+
# Truncate text to max context
|
| 166 |
+
# Rough approximation: 1 token ≈ 4 chars
|
| 167 |
+
max_chars = self.config["max_context"] * 4
|
| 168 |
+
if len(text) > max_chars:
|
| 169 |
+
text = text[:max_chars]
|
| 170 |
+
|
| 171 |
+
# Get embedding
|
| 172 |
+
embedding = self.llm.embed(text)
|
| 173 |
+
|
| 174 |
+
# Normalize vector
|
| 175 |
+
norm = np.linalg.norm(embedding)
|
| 176 |
+
if norm > 0:
|
| 177 |
+
embedding = embedding / norm
|
| 178 |
+
|
| 179 |
+
return embedding
|
| 180 |
+
|
| 181 |
+
def unload(self) -> None:
|
| 182 |
+
"""Unload model and free memory."""
|
| 183 |
+
if self.llm:
|
| 184 |
+
logger.info(f"Unloading embedding model: {self.config['name']}")
|
| 185 |
+
del self.llm
|
| 186 |
+
self.llm = None
|
| 187 |
+
|
| 188 |
+
import gc
|
| 189 |
+
gc.collect()
|
| 190 |
+
time.sleep(0.5)
|
| 191 |
+
|
| 192 |
+
|
| 193 |
+
# ===== HELPER FUNCTIONS =====
|
| 194 |
+
|
| 195 |
+
@dataclass
|
| 196 |
+
class Window:
|
| 197 |
+
"""Represents a transcript window for extraction."""
|
| 198 |
+
id: int
|
| 199 |
+
content: str
|
| 200 |
+
start_turn: int
|
| 201 |
+
end_turn: int
|
| 202 |
+
token_count: int
|
| 203 |
+
|
| 204 |
+
|
| 205 |
+
def format_progress_ticker(
|
| 206 |
+
current_window: int,
|
| 207 |
+
total_windows: int,
|
| 208 |
+
window_tokens: int,
|
| 209 |
+
max_tokens: int,
|
| 210 |
+
items_found: Dict[str, int],
|
| 211 |
+
tokens_per_sec: float,
|
| 212 |
+
eta_seconds: int,
|
| 213 |
+
current_snippet: str
|
| 214 |
+
) -> str:
|
| 215 |
+
"""
|
| 216 |
+
Format progress ticker for extraction UI.
|
| 217 |
+
|
| 218 |
+
Args:
|
| 219 |
+
current_window: Current window number (1-indexed)
|
| 220 |
+
total_windows: Total number of windows
|
| 221 |
+
window_tokens: Tokens in current window
|
| 222 |
+
max_tokens: Maximum tokens (for percentage)
|
| 223 |
+
items_found: Dict of {category: count}
|
| 224 |
+
tokens_per_sec: Generation speed
|
| 225 |
+
eta_seconds: Estimated time to completion
|
| 226 |
+
current_snippet: Last extracted item (truncated)
|
| 227 |
+
|
| 228 |
+
Returns:
|
| 229 |
+
Formatted ticker string
|
| 230 |
+
"""
|
| 231 |
+
# Progress bar
|
| 232 |
+
progress_pct = (current_window / total_windows) * 100
|
| 233 |
+
bar_width = 20
|
| 234 |
+
filled = int(bar_width * progress_pct / 100)
|
| 235 |
+
bar = "█" * filled + "░" * (bar_width - filled)
|
| 236 |
+
|
| 237 |
+
# Item counts
|
| 238 |
+
action_items = items_found.get("action_items", 0)
|
| 239 |
+
decisions = items_found.get("decisions", 0)
|
| 240 |
+
key_points = items_found.get("key_points", 0)
|
| 241 |
+
questions = items_found.get("open_questions", 0)
|
| 242 |
+
total_items = action_items + decisions + key_points + questions
|
| 243 |
+
|
| 244 |
+
# ETA formatting
|
| 245 |
+
if eta_seconds > 60:
|
| 246 |
+
eta_str = f"{eta_seconds // 60}m {eta_seconds % 60}s"
|
| 247 |
+
else:
|
| 248 |
+
eta_str = f"{eta_seconds}s"
|
| 249 |
+
|
| 250 |
+
# Truncate snippet
|
| 251 |
+
snippet = current_snippet[:60] + "..." if len(current_snippet) > 60 else current_snippet
|
| 252 |
+
|
| 253 |
+
ticker = f"""
|
| 254 |
+
🪟 Window {current_window}/{total_windows} | {bar} {progress_pct:.0f}%
|
| 255 |
+
|
| 256 |
+
📊 Extracted: {total_items} items
|
| 257 |
+
✓ Actions: {action_items} | Decisions: {decisions} | Points: {key_points} | Questions: {questions}
|
| 258 |
+
|
| 259 |
+
⚡ Speed: {tokens_per_sec:.1f} tok/s | ETA: {eta_str}
|
| 260 |
+
📝 Latest: {snippet}
|
| 261 |
+
"""
|
| 262 |
+
|
| 263 |
+
return ticker.strip()
|
| 264 |
+
|
| 265 |
+
|
| 266 |
+
def cosine_similarity(vec1: np.ndarray, vec2: np.ndarray) -> float:
|
| 267 |
+
"""
|
| 268 |
+
Compute cosine similarity between two vectors.
|
| 269 |
+
|
| 270 |
+
Args:
|
| 271 |
+
vec1: First vector (normalized)
|
| 272 |
+
vec2: Second vector (normalized)
|
| 273 |
+
|
| 274 |
+
Returns:
|
| 275 |
+
Cosine similarity (0.0 to 1.0)
|
| 276 |
+
"""
|
| 277 |
+
# Vectors should already be normalized, but ensure it
|
| 278 |
+
dot_product = np.dot(vec1, vec2)
|
| 279 |
+
return float(dot_product)
|
| 280 |
+
|
| 281 |
+
|
| 282 |
+
# ===== HELPER FUNCTIONS =====
|
| 283 |
+
|
| 284 |
+
def _try_parse_extraction_json(text: str) -> Optional[Dict[str, List[str]]]:
|
| 285 |
+
"""
|
| 286 |
+
Attempt to parse extraction JSON from LLM output.
|
| 287 |
+
|
| 288 |
+
Args:
|
| 289 |
+
text: Raw LLM output
|
| 290 |
+
|
| 291 |
+
Returns:
|
| 292 |
+
Parsed dict or None if invalid
|
| 293 |
+
"""
|
| 294 |
+
# Remove markdown code blocks
|
| 295 |
+
text = re.sub(r'```json\s*', '', text)
|
| 296 |
+
text = re.sub(r'```\s*$', '', text)
|
| 297 |
+
text = text.strip()
|
| 298 |
+
|
| 299 |
+
try:
|
| 300 |
+
data = json.loads(text)
|
| 301 |
+
|
| 302 |
+
# Validate schema
|
| 303 |
+
required_keys = {"action_items", "decisions", "key_points", "open_questions"}
|
| 304 |
+
if not isinstance(data, dict) or not required_keys.issubset(data.keys()):
|
| 305 |
+
return None
|
| 306 |
+
|
| 307 |
+
# Validate all values are lists
|
| 308 |
+
for key in required_keys:
|
| 309 |
+
if not isinstance(data[key], list):
|
| 310 |
+
return None
|
| 311 |
+
|
| 312 |
+
return data
|
| 313 |
+
|
| 314 |
+
except json.JSONDecodeError:
|
| 315 |
+
return None
|
| 316 |
+
|
| 317 |
+
|
| 318 |
+
def _sample_llm_response(text: str, max_chars: int = 400) -> str:
|
| 319 |
+
"""Sample LLM response for trace logging."""
|
| 320 |
+
if not text:
|
| 321 |
+
return ""
|
| 322 |
+
return text[:max_chars] if len(text) > max_chars else text
|
| 323 |
+
|
| 324 |
+
|
| 325 |
+
# ===== CORE PIPELINE FUNCTIONS =====
|
| 326 |
+
|
| 327 |
+
def stream_extract_from_window(
|
| 328 |
+
extraction_llm: Llama,
|
| 329 |
+
window: Window,
|
| 330 |
+
window_id: int,
|
| 331 |
+
total_windows: int,
|
| 332 |
+
tracer: Any,
|
| 333 |
+
tokenizer: NativeTokenizer,
|
| 334 |
+
model_config: Dict[str, Any],
|
| 335 |
+
enable_reasoning: bool = False
|
| 336 |
+
) -> Generator[Tuple[str, str, Dict[str, List[str]], bool], None, None]:
|
| 337 |
+
"""
|
| 338 |
+
Stream extraction from single window with live progress + optional reasoning.
|
| 339 |
+
|
| 340 |
+
Yields:
|
| 341 |
+
(ticker_text, thinking_text, partial_items, is_complete)
|
| 342 |
+
- ticker_text: Progress ticker for UI
|
| 343 |
+
- thinking_text: Reasoning/thinking blocks (if model supports)
|
| 344 |
+
- partial_items: Current extracted items
|
| 345 |
+
- is_complete: True on final yield
|
| 346 |
+
"""
|
| 347 |
+
# Auto-detect language from window content
|
| 348 |
+
has_cjk = bool(re.search(r'[\u4e00-\u9fff]', window.content))
|
| 349 |
+
output_language = "zh-TW" if has_cjk else "en"
|
| 350 |
+
|
| 351 |
+
# Build system prompt
|
| 352 |
+
from meeting_summarizer.trace import Tracer # Avoid circular import
|
| 353 |
+
|
| 354 |
+
supports_reasoning = model_config.get("supports_reasoning", False)
|
| 355 |
+
supports_toggle = model_config.get("supports_toggle", False)
|
| 356 |
+
|
| 357 |
+
# Build system prompt (reuse function from app.py via import)
|
| 358 |
+
if output_language == "zh-TW":
|
| 359 |
+
reasoning_inst = "使用推理能力分析後提取。" if (supports_toggle and enable_reasoning) else ""
|
| 360 |
+
system_prompt = f"""你是會議分析助手。{reasoning_inst}
|
| 361 |
+
|
| 362 |
+
僅輸出 JSON:
|
| 363 |
+
{{
|
| 364 |
+
"action_items": ["任務", ...],
|
| 365 |
+
"decisions": ["決策", ...],
|
| 366 |
+
"key_points": ["要點", ...],
|
| 367 |
+
"open_questions": ["問題", ...]
|
| 368 |
+
}}"""
|
| 369 |
+
else:
|
| 370 |
+
reasoning_inst = "Use reasoning before extracting." if (supports_toggle and enable_reasoning) else ""
|
| 371 |
+
system_prompt = f"""You are a meeting assistant. {reasoning_inst}
|
| 372 |
+
|
| 373 |
+
Output ONLY JSON:
|
| 374 |
+
{{
|
| 375 |
+
"action_items": ["Task", ...],
|
| 376 |
+
"decisions": ["Decision", ...],
|
| 377 |
+
"key_points": ["Point", ...],
|
| 378 |
+
"open_questions": ["Question", ...]
|
| 379 |
+
}}"""
|
| 380 |
+
|
| 381 |
+
user_prompt = f"Transcript:\n\n{window.content}"
|
| 382 |
+
|
| 383 |
+
messages = [
|
| 384 |
+
{"role": "system", "content": system_prompt},
|
| 385 |
+
{"role": "user", "content": user_prompt}
|
| 386 |
+
]
|
| 387 |
+
|
| 388 |
+
# Stream extraction
|
| 389 |
+
full_response = ""
|
| 390 |
+
thinking_content = ""
|
| 391 |
+
start_time = time.time()
|
| 392 |
+
first_token_time = None
|
| 393 |
+
token_count = 0
|
| 394 |
+
|
| 395 |
+
try:
|
| 396 |
+
settings = model_config["inference_settings"]
|
| 397 |
+
stream = extraction_llm.create_chat_completion(
|
| 398 |
+
messages=messages,
|
| 399 |
+
max_tokens=1024,
|
| 400 |
+
temperature=settings["temperature"],
|
| 401 |
+
top_p=settings["top_p"],
|
| 402 |
+
top_k=settings["top_k"],
|
| 403 |
+
repeat_penalty=settings["repeat_penalty"],
|
| 404 |
+
stream=True,
|
| 405 |
+
)
|
| 406 |
+
|
| 407 |
+
for chunk in stream:
|
| 408 |
+
if 'choices' in chunk and len(chunk['choices']) > 0:
|
| 409 |
+
delta = chunk['choices'][0].get('delta', {})
|
| 410 |
+
content = delta.get('content', '')
|
| 411 |
+
|
| 412 |
+
if content:
|
| 413 |
+
if first_token_time is None:
|
| 414 |
+
first_token_time = time.time()
|
| 415 |
+
|
| 416 |
+
token_count += 1
|
| 417 |
+
full_response += content
|
| 418 |
+
|
| 419 |
+
# Parse thinking blocks if reasoning enabled
|
| 420 |
+
if enable_reasoning and supports_reasoning:
|
| 421 |
+
# Simple regex extraction
|
| 422 |
+
thinking_match = re.search(r'<think(?:ing)?>(.*?)</think(?:ing)?>', full_response, re.DOTALL)
|
| 423 |
+
if thinking_match:
|
| 424 |
+
thinking_content = thinking_match.group(1).strip()
|
| 425 |
+
json_text = full_response[:thinking_match.start()] + full_response[thinking_match.end():]
|
| 426 |
+
else:
|
| 427 |
+
json_text = full_response
|
| 428 |
+
else:
|
| 429 |
+
json_text = full_response
|
| 430 |
+
|
| 431 |
+
# Try parse JSON
|
| 432 |
+
partial_items = _try_parse_extraction_json(json_text)
|
| 433 |
+
if not partial_items:
|
| 434 |
+
partial_items = {"action_items": [], "decisions": [], "key_points": [], "open_questions": []}
|
| 435 |
+
|
| 436 |
+
# Calculate metrics
|
| 437 |
+
elapsed = time.time() - start_time
|
| 438 |
+
tps = token_count / elapsed if elapsed > 0 else 0
|
| 439 |
+
eta = int((1024 - token_count) / tps) if tps > 0 else 0
|
| 440 |
+
|
| 441 |
+
# Get item counts
|
| 442 |
+
items_found = {k: len(v) for k, v in partial_items.items()}
|
| 443 |
+
|
| 444 |
+
# Get last item as snippet
|
| 445 |
+
last_item = ""
|
| 446 |
+
for cat in ["action_items", "decisions", "key_points", "open_questions"]:
|
| 447 |
+
if partial_items.get(cat):
|
| 448 |
+
last_item = partial_items[cat][-1]
|
| 449 |
+
break
|
| 450 |
+
|
| 451 |
+
# Format ticker
|
| 452 |
+
ticker = format_progress_ticker(
|
| 453 |
+
current_window=window_id,
|
| 454 |
+
total_windows=total_windows,
|
| 455 |
+
window_tokens=window.token_count,
|
| 456 |
+
max_tokens=4096,
|
| 457 |
+
items_found=items_found,
|
| 458 |
+
tokens_per_sec=tps,
|
| 459 |
+
eta_seconds=eta,
|
| 460 |
+
current_snippet=last_item
|
| 461 |
+
)
|
| 462 |
+
|
| 463 |
+
yield (ticker, thinking_content, partial_items, False)
|
| 464 |
+
|
| 465 |
+
# Final parse
|
| 466 |
+
if enable_reasoning and supports_reasoning:
|
| 467 |
+
thinking_match = re.search(r'<think(?:ing)?>(.*?)</think(?:ing)?>', full_response, re.DOTALL)
|
| 468 |
+
if thinking_match:
|
| 469 |
+
thinking_content = thinking_match.group(1).strip()
|
| 470 |
+
json_text = full_response[:thinking_match.start()] + full_response[thinking_match.end():]
|
| 471 |
+
else:
|
| 472 |
+
json_text = full_response
|
| 473 |
+
else:
|
| 474 |
+
json_text = full_response
|
| 475 |
+
|
| 476 |
+
final_items = _try_parse_extraction_json(json_text)
|
| 477 |
+
|
| 478 |
+
if not final_items:
|
| 479 |
+
error_msg = f"Failed to parse JSON from window {window_id}"
|
| 480 |
+
tracer.log_extraction(
|
| 481 |
+
window_id=window_id,
|
| 482 |
+
extraction=None,
|
| 483 |
+
llm_response=_sample_llm_response(full_response),
|
| 484 |
+
error=error_msg
|
| 485 |
+
)
|
| 486 |
+
raise ValueError(error_msg)
|
| 487 |
+
|
| 488 |
+
# Log success
|
| 489 |
+
tracer.log_extraction(
|
| 490 |
+
window_id=window_id,
|
| 491 |
+
extraction=final_items,
|
| 492 |
+
llm_response=_sample_llm_response(full_response),
|
| 493 |
+
thinking=_sample_llm_response(thinking_content) if thinking_content else None,
|
| 494 |
+
error=None
|
| 495 |
+
)
|
| 496 |
+
|
| 497 |
+
# Final ticker
|
| 498 |
+
elapsed = time.time() - start_time
|
| 499 |
+
tps = token_count / elapsed if elapsed > 0 else 0
|
| 500 |
+
items_found = {k: len(v) for k, v in final_items.items()}
|
| 501 |
+
|
| 502 |
+
ticker = format_progress_ticker(
|
| 503 |
+
current_window=window_id,
|
| 504 |
+
total_windows=total_windows,
|
| 505 |
+
window_tokens=window.token_count,
|
| 506 |
+
max_tokens=4096,
|
| 507 |
+
items_found=items_found,
|
| 508 |
+
tokens_per_sec=tps,
|
| 509 |
+
eta_seconds=0,
|
| 510 |
+
current_snippet="✅ Extraction complete"
|
| 511 |
+
)
|
| 512 |
+
|
| 513 |
+
yield (ticker, thinking_content, final_items, True)
|
| 514 |
+
|
| 515 |
+
except Exception as e:
|
| 516 |
+
tracer.log_extraction(
|
| 517 |
+
window_id=window_id,
|
| 518 |
+
extraction=None,
|
| 519 |
+
llm_response=_sample_llm_response(full_response) if full_response else "",
|
| 520 |
+
error=str(e)
|
| 521 |
+
)
|
| 522 |
+
raise
|
| 523 |
+
|
| 524 |
+
|
| 525 |
+
def deduplicate_items(
|
| 526 |
+
all_items: Dict[str, List[str]],
|
| 527 |
+
embedding_model: EmbeddingModel,
|
| 528 |
+
similarity_threshold: float,
|
| 529 |
+
tracer: Any
|
| 530 |
+
) -> Dict[str, List[str]]:
|
| 531 |
+
"""
|
| 532 |
+
Deduplicate items across all categories using embeddings.
|
| 533 |
+
|
| 534 |
+
Args:
|
| 535 |
+
all_items: Dict of {category: [items]}
|
| 536 |
+
embedding_model: Loaded embedding model
|
| 537 |
+
similarity_threshold: Cosine similarity threshold (0.0-1.0)
|
| 538 |
+
tracer: Tracer instance
|
| 539 |
+
|
| 540 |
+
Returns:
|
| 541 |
+
Deduplicated dict of {category: [items]}
|
| 542 |
+
"""
|
| 543 |
+
deduplicated = {}
|
| 544 |
+
|
| 545 |
+
for category, items in all_items.items():
|
| 546 |
+
if not items:
|
| 547 |
+
deduplicated[category] = []
|
| 548 |
+
continue
|
| 549 |
+
|
| 550 |
+
original_count = len(items)
|
| 551 |
+
|
| 552 |
+
# Compute embeddings for all items
|
| 553 |
+
embeddings = []
|
| 554 |
+
for item in items:
|
| 555 |
+
emb = embedding_model.embed(item)
|
| 556 |
+
embeddings.append(emb)
|
| 557 |
+
|
| 558 |
+
# Mark duplicates
|
| 559 |
+
keep_indices = []
|
| 560 |
+
for i in range(len(items)):
|
| 561 |
+
is_duplicate = False
|
| 562 |
+
|
| 563 |
+
# Compare with all previously kept items
|
| 564 |
+
for j in keep_indices:
|
| 565 |
+
similarity = cosine_similarity(embeddings[i], embeddings[j])
|
| 566 |
+
if similarity >= similarity_threshold:
|
| 567 |
+
is_duplicate = True
|
| 568 |
+
break
|
| 569 |
+
|
| 570 |
+
if not is_duplicate:
|
| 571 |
+
keep_indices.append(i)
|
| 572 |
+
|
| 573 |
+
# Keep only unique items
|
| 574 |
+
unique_items = [items[i] for i in keep_indices]
|
| 575 |
+
deduplicated[category] = unique_items
|
| 576 |
+
|
| 577 |
+
# Log deduplication
|
| 578 |
+
duplicates_removed = original_count - len(unique_items)
|
| 579 |
+
tracer.log_deduplication(
|
| 580 |
+
category=category,
|
| 581 |
+
original_count=original_count,
|
| 582 |
+
deduplicated_count=len(unique_items),
|
| 583 |
+
duplicates_removed=duplicates_removed,
|
| 584 |
+
similarity_threshold=similarity_threshold,
|
| 585 |
+
embedding_model=embedding_model.model_key
|
| 586 |
+
)
|
| 587 |
+
|
| 588 |
+
logger.info(f"Dedup {category}: {original_count} → {len(unique_items)} ({duplicates_removed} removed)")
|
| 589 |
+
|
| 590 |
+
return deduplicated
|
| 591 |
+
|
| 592 |
+
|
| 593 |
+
def stream_synthesize_executive_summary(
|
| 594 |
+
synthesis_llm: Llama,
|
| 595 |
+
deduplicated_items: Dict[str, List[str]],
|
| 596 |
+
model_config: Dict[str, Any],
|
| 597 |
+
output_language: str,
|
| 598 |
+
enable_reasoning: bool,
|
| 599 |
+
max_tokens: int,
|
| 600 |
+
tracer: Any
|
| 601 |
+
) -> Generator[Tuple[str, str, bool], None, None]:
|
| 602 |
+
"""
|
| 603 |
+
Stream synthesis of executive summary from deduplicated items.
|
| 604 |
+
|
| 605 |
+
Yields:
|
| 606 |
+
(summary_text, thinking_text, is_complete)
|
| 607 |
+
"""
|
| 608 |
+
# Build synthesis prompt
|
| 609 |
+
item_counts = {k: len(v) for k, v in deduplicated_items.items()}
|
| 610 |
+
|
| 611 |
+
# Format items for prompt
|
| 612 |
+
items_text = ""
|
| 613 |
+
for category, items in deduplicated_items.items():
|
| 614 |
+
if items:
|
| 615 |
+
category_label = {
|
| 616 |
+
"action_items": "Action Items" if output_language == "en" else "行動項目",
|
| 617 |
+
"decisions": "Decisions" if output_language == "en" else "決策",
|
| 618 |
+
"key_points": "Key Points" if output_language == "en" else "關鍵要點",
|
| 619 |
+
"open_questions": "Open Questions" if output_language == "en" else "未解決問題"
|
| 620 |
+
}.get(category, category)
|
| 621 |
+
|
| 622 |
+
items_text += f"\n{category_label}:\n"
|
| 623 |
+
for i, item in enumerate(items, 1):
|
| 624 |
+
items_text += f"{i}. {item}\n"
|
| 625 |
+
|
| 626 |
+
if output_language == "zh-TW":
|
| 627 |
+
system_prompt = "你是執行摘要專家。生成簡潔的執行摘要。"
|
| 628 |
+
user_prompt = f"基於以下結構化資訊生成執行摘要:\n{items_text}\n\n請提供簡明的執行摘要。"
|
| 629 |
+
else:
|
| 630 |
+
system_prompt = "You are an executive summary expert. Generate concise summaries."
|
| 631 |
+
user_prompt = f"Generate an executive summary based on these structured items:\n{items_text}\n\nProvide a concise executive summary."
|
| 632 |
+
|
| 633 |
+
messages = [
|
| 634 |
+
{"role": "system", "content": system_prompt},
|
| 635 |
+
{"role": "user", "content": user_prompt}
|
| 636 |
+
]
|
| 637 |
+
|
| 638 |
+
# Stream synthesis
|
| 639 |
+
full_summary = ""
|
| 640 |
+
thinking_content = ""
|
| 641 |
+
|
| 642 |
+
try:
|
| 643 |
+
settings = model_config["inference_settings"]
|
| 644 |
+
stream = synthesis_llm.create_chat_completion(
|
| 645 |
+
messages=messages,
|
| 646 |
+
max_tokens=max_tokens,
|
| 647 |
+
temperature=settings["temperature"],
|
| 648 |
+
top_p=settings["top_p"],
|
| 649 |
+
top_k=settings["top_k"],
|
| 650 |
+
repeat_penalty=settings["repeat_penalty"],
|
| 651 |
+
stream=True,
|
| 652 |
+
)
|
| 653 |
+
|
| 654 |
+
for chunk in stream:
|
| 655 |
+
if 'choices' in chunk and len(chunk['choices']) > 0:
|
| 656 |
+
delta = chunk['choices'][0].get('delta', {})
|
| 657 |
+
content = delta.get('content', '')
|
| 658 |
+
|
| 659 |
+
if content:
|
| 660 |
+
full_summary += content
|
| 661 |
+
|
| 662 |
+
# Parse thinking if reasoning enabled
|
| 663 |
+
if enable_reasoning and model_config.get("supports_reasoning"):
|
| 664 |
+
thinking_match = re.search(r'<think(?:ing)?>(.*?)</think(?:ing)?>', full_summary, re.DOTALL)
|
| 665 |
+
if thinking_match:
|
| 666 |
+
thinking_content = thinking_match.group(1).strip()
|
| 667 |
+
summary_text = full_summary[:thinking_match.start()] + full_summary[thinking_match.end():]
|
| 668 |
+
else:
|
| 669 |
+
summary_text = full_summary
|
| 670 |
+
else:
|
| 671 |
+
summary_text = full_summary
|
| 672 |
+
|
| 673 |
+
yield (summary_text, thinking_content, False)
|
| 674 |
+
|
| 675 |
+
# Final parse
|
| 676 |
+
if enable_reasoning and model_config.get("supports_reasoning"):
|
| 677 |
+
thinking_match = re.search(r'<think(?:ing)?>(.*?)</think(?:ing)?>', full_summary, re.DOTALL)
|
| 678 |
+
if thinking_match:
|
| 679 |
+
thinking_content = thinking_match.group(1).strip()
|
| 680 |
+
summary_text = full_summary[:thinking_match.start()] + full_summary[thinking_match.end():]
|
| 681 |
+
else:
|
| 682 |
+
summary_text = full_summary
|
| 683 |
+
else:
|
| 684 |
+
summary_text = full_summary
|
| 685 |
+
|
| 686 |
+
# Log synthesis
|
| 687 |
+
tracer.log_synthesis(
|
| 688 |
+
synthesis_model=model_config["name"],
|
| 689 |
+
input_item_counts=item_counts,
|
| 690 |
+
output_summary=_sample_llm_response(summary_text),
|
| 691 |
+
thinking=_sample_llm_response(thinking_content) if thinking_content else None,
|
| 692 |
+
error=None
|
| 693 |
+
)
|
| 694 |
+
|
| 695 |
+
yield (summary_text, thinking_content, True)
|
| 696 |
+
|
| 697 |
+
except Exception as e:
|
| 698 |
+
tracer.log_synthesis(
|
| 699 |
+
synthesis_model=model_config["name"],
|
| 700 |
+
input_item_counts=item_counts,
|
| 701 |
+
output_summary="",
|
| 702 |
+
thinking=None,
|
| 703 |
+
error=str(e)
|
| 704 |
+
)
|
| 705 |
+
raise
|
|
@@ -0,0 +1,197 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Trace Logger for Advanced Mode Pipeline
|
| 3 |
+
|
| 4 |
+
Logs extraction, deduplication, and synthesis operations for debugging
|
| 5 |
+
and audit trail. Supports JSONL format for easy parsing.
|
| 6 |
+
"""
|
| 7 |
+
|
| 8 |
+
import json
|
| 9 |
+
import time
|
| 10 |
+
from typing import Dict, List, Any, Optional
|
| 11 |
+
from datetime import datetime
|
| 12 |
+
import logging
|
| 13 |
+
|
| 14 |
+
logger = logging.getLogger(__name__)
|
| 15 |
+
|
| 16 |
+
|
| 17 |
+
class Tracer:
|
| 18 |
+
"""Trace logger for Advanced Mode 3-stage pipeline."""
|
| 19 |
+
|
| 20 |
+
def __init__(self, enabled: bool = True):
|
| 21 |
+
"""
|
| 22 |
+
Initialize tracer.
|
| 23 |
+
|
| 24 |
+
Args:
|
| 25 |
+
enabled: Whether to enable trace logging
|
| 26 |
+
"""
|
| 27 |
+
self.enabled = enabled
|
| 28 |
+
self.trace_entries: List[Dict[str, Any]] = []
|
| 29 |
+
self.start_time = time.time()
|
| 30 |
+
|
| 31 |
+
def log_extraction(
|
| 32 |
+
self,
|
| 33 |
+
window_id: int,
|
| 34 |
+
extraction: Optional[Dict[str, List[str]]],
|
| 35 |
+
llm_response: str,
|
| 36 |
+
thinking: Optional[str] = None,
|
| 37 |
+
error: Optional[str] = None
|
| 38 |
+
) -> None:
|
| 39 |
+
"""
|
| 40 |
+
Log extraction operation for a single window.
|
| 41 |
+
|
| 42 |
+
Args:
|
| 43 |
+
window_id: Window identifier
|
| 44 |
+
extraction: Extracted items dict (action_items, decisions, key_points, open_questions)
|
| 45 |
+
llm_response: Sampled LLM response (first 400 chars)
|
| 46 |
+
thinking: Sampled thinking/reasoning content (if applicable)
|
| 47 |
+
error: Error message if extraction failed
|
| 48 |
+
"""
|
| 49 |
+
if not self.enabled:
|
| 50 |
+
return
|
| 51 |
+
|
| 52 |
+
entry = {
|
| 53 |
+
"stage": "extraction",
|
| 54 |
+
"timestamp": datetime.now().isoformat(),
|
| 55 |
+
"elapsed_seconds": round(time.time() - self.start_time, 2),
|
| 56 |
+
"window_id": window_id,
|
| 57 |
+
"success": extraction is not None and error is None,
|
| 58 |
+
"error": error,
|
| 59 |
+
"extraction": extraction,
|
| 60 |
+
"llm_response_sample": llm_response[:400] if llm_response else None,
|
| 61 |
+
"thinking_sample": thinking[:400] if thinking else None,
|
| 62 |
+
}
|
| 63 |
+
|
| 64 |
+
self.trace_entries.append(entry)
|
| 65 |
+
logger.debug(f"[Trace] Extraction window {window_id}: {entry['success']}")
|
| 66 |
+
|
| 67 |
+
def log_deduplication(
|
| 68 |
+
self,
|
| 69 |
+
category: str,
|
| 70 |
+
original_count: int,
|
| 71 |
+
deduplicated_count: int,
|
| 72 |
+
duplicates_removed: int,
|
| 73 |
+
similarity_threshold: float,
|
| 74 |
+
embedding_model: str
|
| 75 |
+
) -> None:
|
| 76 |
+
"""
|
| 77 |
+
Log deduplication operation for a category.
|
| 78 |
+
|
| 79 |
+
Args:
|
| 80 |
+
category: Category name (action_items, decisions, etc.)
|
| 81 |
+
original_count: Number of items before deduplication
|
| 82 |
+
deduplicated_count: Number of items after deduplication
|
| 83 |
+
duplicates_removed: Number of duplicates removed
|
| 84 |
+
similarity_threshold: Similarity threshold used
|
| 85 |
+
embedding_model: Embedding model used
|
| 86 |
+
"""
|
| 87 |
+
if not self.enabled:
|
| 88 |
+
return
|
| 89 |
+
|
| 90 |
+
entry = {
|
| 91 |
+
"stage": "deduplication",
|
| 92 |
+
"timestamp": datetime.now().isoformat(),
|
| 93 |
+
"elapsed_seconds": round(time.time() - self.start_time, 2),
|
| 94 |
+
"category": category,
|
| 95 |
+
"original_count": original_count,
|
| 96 |
+
"deduplicated_count": deduplicated_count,
|
| 97 |
+
"duplicates_removed": duplicates_removed,
|
| 98 |
+
"duplicate_rate": round(duplicates_removed / original_count * 100, 1) if original_count > 0 else 0.0,
|
| 99 |
+
"similarity_threshold": similarity_threshold,
|
| 100 |
+
"embedding_model": embedding_model,
|
| 101 |
+
}
|
| 102 |
+
|
| 103 |
+
self.trace_entries.append(entry)
|
| 104 |
+
logger.debug(f"[Trace] Deduplication {category}: {original_count} → {deduplicated_count} ({duplicates_removed} removed)")
|
| 105 |
+
|
| 106 |
+
def log_synthesis(
|
| 107 |
+
self,
|
| 108 |
+
synthesis_model: str,
|
| 109 |
+
input_item_counts: Dict[str, int],
|
| 110 |
+
output_summary: str,
|
| 111 |
+
thinking: Optional[str] = None,
|
| 112 |
+
error: Optional[str] = None
|
| 113 |
+
) -> None:
|
| 114 |
+
"""
|
| 115 |
+
Log synthesis operation.
|
| 116 |
+
|
| 117 |
+
Args:
|
| 118 |
+
synthesis_model: Model key used for synthesis
|
| 119 |
+
input_item_counts: Dict of category counts fed to synthesis
|
| 120 |
+
output_summary: Generated summary (sampled)
|
| 121 |
+
thinking: Thinking/reasoning content (sampled, if applicable)
|
| 122 |
+
error: Error message if synthesis failed
|
| 123 |
+
"""
|
| 124 |
+
if not self.enabled:
|
| 125 |
+
return
|
| 126 |
+
|
| 127 |
+
entry = {
|
| 128 |
+
"stage": "synthesis",
|
| 129 |
+
"timestamp": datetime.now().isoformat(),
|
| 130 |
+
"elapsed_seconds": round(time.time() - self.start_time, 2),
|
| 131 |
+
"synthesis_model": synthesis_model,
|
| 132 |
+
"input_item_counts": input_item_counts,
|
| 133 |
+
"success": error is None,
|
| 134 |
+
"error": error,
|
| 135 |
+
"output_summary_sample": output_summary[:400] if output_summary else None,
|
| 136 |
+
"thinking_sample": thinking[:400] if thinking else None,
|
| 137 |
+
}
|
| 138 |
+
|
| 139 |
+
self.trace_entries.append(entry)
|
| 140 |
+
logger.debug(f"[Trace] Synthesis: {entry['success']}")
|
| 141 |
+
|
| 142 |
+
def get_trace_jsonl(self) -> str:
|
| 143 |
+
"""
|
| 144 |
+
Get trace entries as JSONL string.
|
| 145 |
+
|
| 146 |
+
Returns:
|
| 147 |
+
JSONL string (one JSON object per line)
|
| 148 |
+
"""
|
| 149 |
+
if not self.enabled:
|
| 150 |
+
return ""
|
| 151 |
+
|
| 152 |
+
return "\n".join(json.dumps(entry, ensure_ascii=False) for entry in self.trace_entries)
|
| 153 |
+
|
| 154 |
+
def get_trace_json(self) -> List[Dict[str, Any]]:
|
| 155 |
+
"""
|
| 156 |
+
Get trace entries as list of dicts.
|
| 157 |
+
|
| 158 |
+
Returns:
|
| 159 |
+
List of trace entry dicts
|
| 160 |
+
"""
|
| 161 |
+
if not self.enabled:
|
| 162 |
+
return []
|
| 163 |
+
|
| 164 |
+
return self.trace_entries
|
| 165 |
+
|
| 166 |
+
def get_summary_stats(self) -> Dict[str, Any]:
|
| 167 |
+
"""
|
| 168 |
+
Get summary statistics from trace.
|
| 169 |
+
|
| 170 |
+
Returns:
|
| 171 |
+
Dict with pipeline statistics
|
| 172 |
+
"""
|
| 173 |
+
if not self.enabled or not self.trace_entries:
|
| 174 |
+
return {}
|
| 175 |
+
|
| 176 |
+
extraction_entries = [e for e in self.trace_entries if e["stage"] == "extraction"]
|
| 177 |
+
dedup_entries = [e for e in self.trace_entries if e["stage"] == "deduplication"]
|
| 178 |
+
synthesis_entries = [e for e in self.trace_entries if e["stage"] == "synthesis"]
|
| 179 |
+
|
| 180 |
+
total_extracted = sum(
|
| 181 |
+
sum(e["extraction"].values()) if e.get("extraction") else 0
|
| 182 |
+
for e in extraction_entries
|
| 183 |
+
)
|
| 184 |
+
|
| 185 |
+
total_deduplicated = sum(e["deduplicated_count"] for e in dedup_entries)
|
| 186 |
+
total_duplicates = sum(e["duplicates_removed"] for e in dedup_entries)
|
| 187 |
+
|
| 188 |
+
return {
|
| 189 |
+
"total_windows": len(extraction_entries),
|
| 190 |
+
"successful_extractions": sum(1 for e in extraction_entries if e["success"]),
|
| 191 |
+
"total_items_extracted": total_extracted,
|
| 192 |
+
"total_items_after_dedup": total_deduplicated,
|
| 193 |
+
"total_duplicates_removed": total_duplicates,
|
| 194 |
+
"duplicate_rate": round(total_duplicates / total_extracted * 100, 1) if total_extracted > 0 else 0.0,
|
| 195 |
+
"synthesis_success": synthesis_entries[0]["success"] if synthesis_entries else False,
|
| 196 |
+
"total_elapsed_seconds": round(time.time() - self.start_time, 2),
|
| 197 |
+
}
|