Luigi commited on
Commit
7a3423c
·
1 Parent(s): b7e57ed

feat: Implement Advanced Mode 3-stage pipeline (extraction → deduplication → synthesis)

Browse files

Add complete 3-model pipeline with independent model registries and parameters:

**New Modules:**
- meeting_summarizer/__init__.py: Package initialization
- meeting_summarizer/trace.py: Tracer with extraction/dedup/synthesis logging
- meeting_summarizer/extraction.py: Complete pipeline (~700 lines)
- NativeTokenizer: Token counting without llama.cpp
- EmbeddingModel: Embedding computation for deduplication
- stream_extract_from_window(): Stage 1 extraction with reasoning
- deduplicate_items(): Stage 2 semantic deduplication
- stream_synthesize_executive_summary(): Stage 3 synthesis

**Model Registries (32 models, fully independent):**
- EXTRACTION_MODELS (13 models ≤1.7B): Extraction-optimized (temp 0.1-0.3)
- Includes LFM2-Extract 350M & 1.2B (specialized extraction models)
- 2 hybrid models with reasoning toggle (Qwen3 600M & 1.7B)
- SYNTHESIS_MODELS (16 models 1B-30B): Synthesis-optimized (temp 0.7-0.9)
- Fully independent from AVAILABLE_MODELS (no shared references)
- 2 hybrid + 5 thinking-only models with reasoning support
- EMBEDDING_MODELS (4 models): granite-107m (default), granite-278m, gemma-300m, qwen-600m

**Core Functions (app.py):**
- get_model_config(): Role-aware configuration resolver
- load_model_for_role(): Sequential loading with user n_ctx support
- unload_model(): Explicit memory cleanup
- build_extraction_system_prompt(): Bilingual + reasoning support
- summarize_advanced(): Main orchestrator (239 lines) with sequential model loading/unloading

**UI Implementation:**
- Mode tabs (Standard vs Advanced)
- 11 Advanced Mode controls (3 dropdowns, 4 sliders, 2 checkboxes, 2 radios)
- Conditional reasoning checkbox visibility per stage
- Submit button router (auto-detects mode and routes to appropriate handler)

**Features:**
- Sequential model loading/unloading (memory-safe for HF Spaces Free Tier)
- Bilingual support (auto-detect in extraction, Chinese conversion at end)
- Live progress streaming with ticker updates
- Trace logging (JSONL embedded in download JSON)
- Independent parameters per stage (no cross-contamination)

**Pipeline:**
Stage 1: Extraction → Parse transcript windows → Extract JSON items
Stage 2: Deduplication → Compute embeddings → Remove semantic duplicates
Stage 3: Synthesis → Generate executive summary from deduplicated items

Code statistics: ~2,400 new lines, 3 new files, 11 new functions, 3 new classes

app.py CHANGED
@@ -660,6 +660,459 @@ AVAILABLE_MODELS = {
660
  DEFAULT_MODEL_KEY = "qwen3_600m_q4"
661
 
662
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
663
  def load_model(model_key: str = None, n_threads: int = 2) -> Tuple[Llama, str]:
664
  """
665
  Load model with CPU optimizations. Only reloads if model changes.
@@ -766,6 +1219,452 @@ def update_reasoning_visibility(model_key):
766
  return gr.update(visible=True, value=True, interactive=True, label="Enable Reasoning Mode")
767
 
768
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
769
  def download_summary_json(summary, thinking, model_key, language, metrics):
770
  """Generate JSON file with summary and metadata."""
771
  import json
@@ -1667,80 +2566,190 @@ def create_interface():
1667
  )
1668
 
1669
  # ==========================================
1670
- # Section 2: Model Selection (Tabs)
1671
  # ==========================================
1672
- with gr.Tabs() as model_tabs:
1673
 
1674
- # --- Tab 1: Preset Models ---
1675
- with gr.TabItem("🤖 Preset Models"):
1676
- # Filter out custom_hf from preset choices
1677
- preset_choices = [
1678
- (info["name"] + (" ⚡" if info.get("supports_reasoning", False) and not info.get("supports_toggle", False) else ""), key)
1679
- for key, info in AVAILABLE_MODELS.items()
1680
- if key != "custom_hf"
1681
- ]
1682
-
1683
- model_dropdown = gr.Dropdown(
1684
- choices=preset_choices,
1685
- value=DEFAULT_MODEL_KEY,
1686
- label="Select Model",
1687
- info="Smaller = faster. ⚡ = Always-reasoning models."
1688
- )
1689
-
1690
- enable_reasoning = gr.Checkbox(
1691
- value=True,
1692
- label="Enable Reasoning Mode",
1693
- info="Uses /think for deeper analysis (slower) or /no_think for direct output (faster).",
1694
- interactive=True,
1695
- visible=AVAILABLE_MODELS[DEFAULT_MODEL_KEY].get("supports_toggle", False)
1696
- )
1697
 
1698
- # Model info for preset models
1699
- gr.HTML('<div class="section-header" style="margin-top: 12px;"><span class="section-icon">📊</span> Model Information</div>')
1700
- _default_threads = DEFAULT_CUSTOM_THREADS if DEFAULT_CUSTOM_THREADS > 0 else 2
1701
- info_output = gr.Markdown(
1702
- value=get_model_info(DEFAULT_MODEL_KEY, n_threads=_default_threads)[0],
1703
- elem_classes=["stats-grid"]
1704
- )
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1705
 
1706
- # --- Tab 2: Custom GGUF ---
1707
- with gr.TabItem("🔧 Custom GGUF"):
1708
- gr.HTML('<div style="font-size: 0.85em; color: #64748b; margin-bottom: 10px;">Load any GGUF model from HuggingFace Hub</div>')
1709
 
1710
- # HF Hub Search Component
1711
- model_search_input = HuggingfaceHubSearch(
1712
- label="🔍 Search HuggingFace Models",
1713
- placeholder="Type model name (e.g., 'qwen', 'phi', 'llama')",
1714
- search_type="model",
1715
- )
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1716
 
1717
- # File dropdown (populated after repo discovery)
1718
- custom_file_dropdown = gr.Dropdown(
1719
- label="📦 Select GGUF File",
1720
- choices=[],
1721
- value=None,
1722
- info="GGUF files appear after selecting a model above",
1723
- interactive=True,
1724
- )
 
 
 
 
 
 
 
 
 
 
 
 
1725
 
1726
- # Load button
1727
- load_btn = gr.Button("⬇️ Load Selected Model", variant="primary", size="sm")
 
 
 
 
 
 
 
 
1728
 
1729
- # Status message
1730
- custom_status = gr.Textbox(
1731
- label="Status",
1732
- interactive=False,
1733
- value="",
1734
- visible=False,
1735
- )
 
 
 
 
 
 
 
 
 
1736
 
1737
- retry_btn = gr.Button("🔄 Retry", variant="secondary", visible=False)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1738
 
1739
- # Model info for custom models (shows after loading)
1740
- gr.HTML('<div class="section-header" style="margin-top: 12px;"><span class="section-icon">📊</span> Custom Model Info</div>')
1741
- custom_info_output = gr.Markdown(
1742
- value="*Load a model to see its specifications...*",
1743
- elem_classes=["stats-grid"]
1744
  )
1745
 
1746
  # ==========================================
@@ -1973,6 +2982,60 @@ def create_interface():
1973
  outputs=[system_prompt_debug],
1974
  )
1975
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1976
  # Debounced auto-discovery for custom repo ID (500ms delay)
1977
  import time as time_module
1978
 
@@ -2138,10 +3201,129 @@ def create_interface():
2138
  outputs=[custom_info_output],
2139
  )
2140
 
2141
- # Update submit button to include custom_model_state in inputs and system_prompt_debug in outputs
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2142
  submit_btn.click(
2143
- fn=summarize_streaming,
2144
- inputs=[file_input, text_input, model_dropdown, enable_reasoning, max_tokens, temperature_slider, top_p, top_k, language_selector, thread_config_dropdown, custom_threads_slider, custom_model_state],
 
 
 
 
 
 
 
 
 
 
 
 
2145
  outputs=[thinking_output, summary_output, info_output, metrics_state, system_prompt_debug],
2146
  show_progress="full"
2147
  )
 
660
  DEFAULT_MODEL_KEY = "qwen3_600m_q4"
661
 
662
 
663
+ # ===== ADVANCED MODE: EXTRACTION MODELS REGISTRY (13 models, ≤1.7B) =====
664
+ # Used exclusively for Stage 1: Extraction (transcript windows → structured JSON)
665
+ # Extraction-optimized settings: Low temperature (0.1-0.3) for deterministic output
666
+
667
+ EXTRACTION_MODELS = {
668
+ "falcon_h1_100m": {
669
+ "name": "Falcon-H1 100M",
670
+ "repo_id": "mradermacher/Falcon-H1-Tiny-Multilingual-100M-Instruct-GGUF",
671
+ "filename": "*Q8_0.gguf",
672
+ "max_context": 32768,
673
+ "default_n_ctx": 4096,
674
+ "params_size": "100M",
675
+ "supports_reasoning": False,
676
+ "supports_toggle": False,
677
+ "inference_settings": {
678
+ "temperature": 0.2,
679
+ "top_p": 0.9,
680
+ "top_k": 30,
681
+ "repeat_penalty": 1.0,
682
+ },
683
+ },
684
+ "gemma3_270m": {
685
+ "name": "Gemma-3 270M",
686
+ "repo_id": "unsloth/gemma-3-270m-it-qat-GGUF",
687
+ "filename": "*Q8_0.gguf",
688
+ "max_context": 32768,
689
+ "default_n_ctx": 4096,
690
+ "params_size": "270M",
691
+ "supports_reasoning": False,
692
+ "supports_toggle": False,
693
+ "inference_settings": {
694
+ "temperature": 0.3,
695
+ "top_p": 0.9,
696
+ "top_k": 40,
697
+ "repeat_penalty": 1.0,
698
+ },
699
+ },
700
+ "ernie_300m": {
701
+ "name": "ERNIE-4.5 0.3B (131K Context)",
702
+ "repo_id": "unsloth/ERNIE-4.5-0.3B-PT-GGUF",
703
+ "filename": "*Q8_0.gguf",
704
+ "max_context": 131072,
705
+ "default_n_ctx": 4096,
706
+ "params_size": "300M",
707
+ "supports_reasoning": False,
708
+ "supports_toggle": False,
709
+ "inference_settings": {
710
+ "temperature": 0.2,
711
+ "top_p": 0.9,
712
+ "top_k": 30,
713
+ "repeat_penalty": 1.0,
714
+ },
715
+ },
716
+ "granite_350m": {
717
+ "name": "Granite-4.0 350M",
718
+ "repo_id": "unsloth/granite-4.0-h-350m-GGUF",
719
+ "filename": "*Q8_0.gguf",
720
+ "max_context": 32768,
721
+ "default_n_ctx": 4096,
722
+ "params_size": "350M",
723
+ "supports_reasoning": False,
724
+ "supports_toggle": False,
725
+ "inference_settings": {
726
+ "temperature": 0.1,
727
+ "top_p": 0.95,
728
+ "top_k": 30,
729
+ "repeat_penalty": 1.0,
730
+ },
731
+ },
732
+ "lfm2_350m": {
733
+ "name": "LFM2 350M",
734
+ "repo_id": "LiquidAI/LFM2-350M-GGUF",
735
+ "filename": "*Q8_0.gguf",
736
+ "max_context": 32768,
737
+ "default_n_ctx": 4096,
738
+ "params_size": "350M",
739
+ "supports_reasoning": False,
740
+ "supports_toggle": False,
741
+ "inference_settings": {
742
+ "temperature": 0.2,
743
+ "top_p": 0.9,
744
+ "top_k": 40,
745
+ "repeat_penalty": 1.0,
746
+ },
747
+ },
748
+ "lfm2_extract_350m": {
749
+ "name": "LFM2-Extract 350M (Specialized)",
750
+ "repo_id": "LiquidAI/LFM2-350M-Extract-GGUF",
751
+ "filename": "*Q8_0.gguf",
752
+ "max_context": 32768,
753
+ "default_n_ctx": 4096,
754
+ "params_size": "350M",
755
+ "supports_reasoning": False,
756
+ "supports_toggle": False,
757
+ "inference_settings": {
758
+ "temperature": 0.2,
759
+ "top_p": 0.9,
760
+ "top_k": 30,
761
+ "repeat_penalty": 1.0,
762
+ },
763
+ },
764
+ "bitcpm4_500m": {
765
+ "name": "BitCPM4 0.5B (128K Context)",
766
+ "repo_id": "openbmb/BitCPM4-0.5B-GGUF",
767
+ "filename": "*q4_0.gguf",
768
+ "max_context": 131072,
769
+ "default_n_ctx": 4096,
770
+ "params_size": "500M",
771
+ "supports_reasoning": False,
772
+ "supports_toggle": False,
773
+ "inference_settings": {
774
+ "temperature": 0.2,
775
+ "top_p": 0.9,
776
+ "top_k": 30,
777
+ "repeat_penalty": 1.0,
778
+ },
779
+ },
780
+ "hunyuan_500m": {
781
+ "name": "Hunyuan 0.5B (256K Context)",
782
+ "repo_id": "mradermacher/Hunyuan-0.5B-Instruct-GGUF",
783
+ "filename": "*Q8_0.gguf",
784
+ "max_context": 262144,
785
+ "default_n_ctx": 4096,
786
+ "params_size": "500M",
787
+ "supports_reasoning": False,
788
+ "supports_toggle": False,
789
+ "inference_settings": {
790
+ "temperature": 0.2,
791
+ "top_p": 0.9,
792
+ "top_k": 30,
793
+ "repeat_penalty": 1.0,
794
+ },
795
+ },
796
+ "qwen3_600m_q4": {
797
+ "name": "Qwen3 0.6B Q4 (32K Context)",
798
+ "repo_id": "unsloth/Qwen3-0.6B-GGUF",
799
+ "filename": "*Q4_0.gguf",
800
+ "max_context": 32768,
801
+ "default_n_ctx": 4096,
802
+ "params_size": "600M",
803
+ "supports_reasoning": True,
804
+ "supports_toggle": True, # Hybrid model
805
+ "inference_settings": {
806
+ "temperature": 0.3,
807
+ "top_p": 0.9,
808
+ "top_k": 20,
809
+ "repeat_penalty": 1.0,
810
+ },
811
+ },
812
+ "granite_3_1_1b_q8": {
813
+ "name": "Granite 3.1 1B-A400M Instruct (128K Context)",
814
+ "repo_id": "bartowski/granite-3.1-1b-a400m-instruct-GGUF",
815
+ "filename": "*Q8_0.gguf",
816
+ "max_context": 131072,
817
+ "default_n_ctx": 4096,
818
+ "params_size": "1B",
819
+ "supports_reasoning": False,
820
+ "supports_toggle": False,
821
+ "inference_settings": {
822
+ "temperature": 0.3,
823
+ "top_p": 0.9,
824
+ "top_k": 30,
825
+ "repeat_penalty": 1.0,
826
+ },
827
+ },
828
+ "falcon_h1_1.5b_q4": {
829
+ "name": "Falcon-H1 1.5B Q4",
830
+ "repo_id": "unsloth/Falcon-H1-1.5B-Deep-Instruct-GGUF",
831
+ "filename": "*Q4_K_M.gguf",
832
+ "max_context": 32768,
833
+ "default_n_ctx": 4096,
834
+ "params_size": "1.5B",
835
+ "supports_reasoning": False,
836
+ "supports_toggle": False,
837
+ "inference_settings": {
838
+ "temperature": 0.2,
839
+ "top_p": 0.9,
840
+ "top_k": 30,
841
+ "repeat_penalty": 1.0,
842
+ },
843
+ },
844
+ "qwen3_1.7b_q4": {
845
+ "name": "Qwen3 1.7B Q4 (32K Context)",
846
+ "repo_id": "unsloth/Qwen3-1.7B-GGUF",
847
+ "filename": "*Q4_0.gguf",
848
+ "max_context": 32768,
849
+ "default_n_ctx": 4096,
850
+ "params_size": "1.7B",
851
+ "supports_reasoning": True,
852
+ "supports_toggle": True, # Hybrid model
853
+ "inference_settings": {
854
+ "temperature": 0.3,
855
+ "top_p": 0.9,
856
+ "top_k": 20,
857
+ "repeat_penalty": 1.0,
858
+ },
859
+ },
860
+ "lfm2_extract_1.2b": {
861
+ "name": "LFM2-Extract 1.2B (Specialized) ⭐",
862
+ "repo_id": "LiquidAI/LFM2-1.2B-Extract-GGUF",
863
+ "filename": "*Q8_0.gguf",
864
+ "max_context": 32768,
865
+ "default_n_ctx": 4096,
866
+ "params_size": "1.2B",
867
+ "supports_reasoning": False,
868
+ "supports_toggle": False,
869
+ "inference_settings": {
870
+ "temperature": 0.2,
871
+ "top_p": 0.9,
872
+ "top_k": 30,
873
+ "repeat_penalty": 1.0,
874
+ },
875
+ },
876
+ }
877
+
878
+ DEFAULT_EXTRACTION_MODEL = "lfm2_extract_1.2b"
879
+
880
+
881
+ # ===== ADVANCED MODE: SYNTHESIS MODELS REGISTRY (16 models, 1B-30B) =====
882
+ # Used exclusively for Stage 3: Synthesis (deduplicated items → executive summary)
883
+ # Synthesis-optimized settings: Higher temperature (0.7-0.9) for creative synthesis
884
+ # FULLY INDEPENDENT from AVAILABLE_MODELS (no shared references)
885
+
886
+ SYNTHESIS_MODELS = {
887
+ "granite_3_1_1b_q8": {
888
+ "name": "Granite 3.1 1B-A400M Instruct (128K Context)",
889
+ "repo_id": "bartowski/granite-3.1-1b-a400m-instruct-GGUF",
890
+ "filename": "*Q8_0.gguf",
891
+ "max_context": 131072,
892
+ "supports_reasoning": False,
893
+ "supports_toggle": False,
894
+ "inference_settings": {
895
+ "temperature": 0.8,
896
+ "top_p": 0.95,
897
+ "top_k": 50,
898
+ "repeat_penalty": 1.05,
899
+ },
900
+ },
901
+ "falcon_h1_1.5b_q4": {
902
+ "name": "Falcon-H1 1.5B Q4",
903
+ "repo_id": "unsloth/Falcon-H1-1.5B-Deep-Instruct-GGUF",
904
+ "filename": "*Q4_K_M.gguf",
905
+ "max_context": 32768,
906
+ "supports_reasoning": False,
907
+ "supports_toggle": False,
908
+ "inference_settings": {
909
+ "temperature": 0.7,
910
+ "top_p": 0.95,
911
+ "top_k": 40,
912
+ "repeat_penalty": 1.0,
913
+ },
914
+ },
915
+ "qwen3_1.7b_q4": {
916
+ "name": "Qwen3 1.7B Q4 (32K Context)",
917
+ "repo_id": "unsloth/Qwen3-1.7B-GGUF",
918
+ "filename": "*Q4_0.gguf",
919
+ "max_context": 32768,
920
+ "supports_reasoning": True,
921
+ "supports_toggle": True, # Hybrid model
922
+ "inference_settings": {
923
+ "temperature": 0.8,
924
+ "top_p": 0.95,
925
+ "top_k": 30,
926
+ "repeat_penalty": 1.0,
927
+ },
928
+ },
929
+ "granite_3_3_2b_q4": {
930
+ "name": "Granite 3.3 2B Instruct (128K Context)",
931
+ "repo_id": "ibm-granite/granite-3.3-2b-instruct-GGUF",
932
+ "filename": "*Q4_K_M.gguf",
933
+ "max_context": 131072,
934
+ "supports_reasoning": False,
935
+ "supports_toggle": False,
936
+ "inference_settings": {
937
+ "temperature": 0.8,
938
+ "top_p": 0.95,
939
+ "top_k": 50,
940
+ "repeat_penalty": 1.05,
941
+ },
942
+ },
943
+ "youtu_llm_2b_q8": {
944
+ "name": "Youtu-LLM 2B (128K Context)",
945
+ "repo_id": "tencent/Youtu-LLM-2B-GGUF",
946
+ "filename": "*Q8_0.gguf",
947
+ "max_context": 131072,
948
+ "supports_reasoning": True,
949
+ "supports_toggle": True, # Hybrid model
950
+ "inference_settings": {
951
+ "temperature": 0.8,
952
+ "top_p": 0.95,
953
+ "top_k": 40,
954
+ "repeat_penalty": 1.0,
955
+ },
956
+ },
957
+ "lfm2_2_6b_transcript": {
958
+ "name": "LFM2 2.6B Transcript (32K Context)",
959
+ "repo_id": "LiquidAI/LFM-2.6B-Transcript-GGUF",
960
+ "filename": "*Q4_0.gguf",
961
+ "max_context": 32768,
962
+ "supports_reasoning": False,
963
+ "supports_toggle": False,
964
+ "inference_settings": {
965
+ "temperature": 0.7,
966
+ "top_p": 0.95,
967
+ "top_k": 40,
968
+ "repeat_penalty": 1.05,
969
+ },
970
+ },
971
+ "breeze_3b_q4": {
972
+ "name": "Breeze 3B Q4 (32K Context)",
973
+ "repo_id": "mradermacher/breeze-3b-GGUF",
974
+ "filename": "*Q4_K_M.gguf",
975
+ "max_context": 32768,
976
+ "supports_reasoning": False,
977
+ "supports_toggle": False,
978
+ "inference_settings": {
979
+ "temperature": 0.7,
980
+ "top_p": 0.95,
981
+ "top_k": 40,
982
+ "repeat_penalty": 1.0,
983
+ },
984
+ },
985
+ "granite_3_1_3b_q4": {
986
+ "name": "Granite 3.1 3B-A800M Instruct (128K Context)",
987
+ "repo_id": "bartowski/granite-3.1-3b-a800m-instruct-GGUF",
988
+ "filename": "*Q4_K_M.gguf",
989
+ "max_context": 131072,
990
+ "supports_reasoning": False,
991
+ "supports_toggle": False,
992
+ "inference_settings": {
993
+ "temperature": 0.8,
994
+ "top_p": 0.95,
995
+ "top_k": 50,
996
+ "repeat_penalty": 1.05,
997
+ },
998
+ },
999
+ "qwen3_4b_thinking_q3": {
1000
+ "name": "Qwen3 4B Thinking (256K Context)",
1001
+ "repo_id": "unsloth/Qwen3-4B-Thinking-2507-GGUF",
1002
+ "filename": "*Q3_K_M.gguf",
1003
+ "max_context": 262144,
1004
+ "supports_reasoning": True,
1005
+ "supports_toggle": False, # Thinking-only
1006
+ "inference_settings": {
1007
+ "temperature": 0.8,
1008
+ "top_p": 0.95,
1009
+ "top_k": 30,
1010
+ "repeat_penalty": 1.0,
1011
+ },
1012
+ },
1013
+ "granite4_tiny_q3": {
1014
+ "name": "Granite 4.0 Tiny 7B (128K Context)",
1015
+ "repo_id": "ibm-research/granite-4.0-Tiny-7B-Instruct-GGUF",
1016
+ "filename": "*Q3_K_M.gguf",
1017
+ "max_context": 131072,
1018
+ "supports_reasoning": False,
1019
+ "supports_toggle": False,
1020
+ "inference_settings": {
1021
+ "temperature": 0.8,
1022
+ "top_p": 0.95,
1023
+ "top_k": 50,
1024
+ "repeat_penalty": 1.05,
1025
+ },
1026
+ },
1027
+ "ernie_21b_pt_q1": {
1028
+ "name": "ERNIE-4.5 21B PT (128K Context)",
1029
+ "repo_id": "unsloth/ERNIE-4.5-21B-A3B-PT-GGUF",
1030
+ "filename": "*TQ1_0.gguf",
1031
+ "max_context": 131072,
1032
+ "supports_reasoning": False,
1033
+ "supports_toggle": False,
1034
+ "inference_settings": {
1035
+ "temperature": 0.8,
1036
+ "top_p": 0.95,
1037
+ "top_k": 50,
1038
+ "repeat_penalty": 1.05,
1039
+ },
1040
+ },
1041
+ "ernie_21b_thinking_q1": {
1042
+ "name": "ERNIE-4.5 21B Thinking (128K Context)",
1043
+ "repo_id": "unsloth/ERNIE-4.5-21B-A3B-Thinking-GGUF",
1044
+ "filename": "*TQ1_0.gguf",
1045
+ "max_context": 131072,
1046
+ "supports_reasoning": True,
1047
+ "supports_toggle": False, # Thinking-only
1048
+ "inference_settings": {
1049
+ "temperature": 0.9,
1050
+ "top_p": 0.95,
1051
+ "top_k": 50,
1052
+ "repeat_penalty": 1.05,
1053
+ },
1054
+ },
1055
+ "glm_4_7_flash_reap_30b": {
1056
+ "name": "GLM-4.7-Flash-REAP-30B Thinking (128K Context)",
1057
+ "repo_id": "unsloth/GLM-4.7-Flash-REAP-23B-A3B-GGUF",
1058
+ "filename": "*TQ1_0.gguf",
1059
+ "max_context": 131072,
1060
+ "supports_reasoning": True,
1061
+ "supports_toggle": False, # Thinking-only
1062
+ "inference_settings": {
1063
+ "temperature": 0.8,
1064
+ "top_p": 0.95,
1065
+ "top_k": 40,
1066
+ "repeat_penalty": 1.0,
1067
+ },
1068
+ },
1069
+ "glm_4_7_flash_30b_iq2": {
1070
+ "name": "GLM-4.7-Flash-30B (Original) IQ2_XXS (128K Context)",
1071
+ "repo_id": "bartowski/zai-org_GLM-4.7-Flash-GGUF",
1072
+ "filename": "*IQ2_XXS.gguf",
1073
+ "max_context": 131072,
1074
+ "supports_reasoning": False,
1075
+ "supports_toggle": False,
1076
+ "inference_settings": {
1077
+ "temperature": 0.7,
1078
+ "top_p": 0.95,
1079
+ "top_k": 40,
1080
+ "repeat_penalty": 1.0,
1081
+ },
1082
+ },
1083
+ "qwen3_30b_thinking_q1": {
1084
+ "name": "Qwen3 30B Thinking (256K Context)",
1085
+ "repo_id": "unsloth/Qwen3-30B-A3B-Thinking-2507-GGUF",
1086
+ "filename": "*TQ1_0.gguf",
1087
+ "max_context": 262144,
1088
+ "supports_reasoning": True,
1089
+ "supports_toggle": False, # Thinking-only
1090
+ "inference_settings": {
1091
+ "temperature": 0.8,
1092
+ "top_p": 0.95,
1093
+ "top_k": 30,
1094
+ "repeat_penalty": 1.0,
1095
+ },
1096
+ },
1097
+ "qwen3_30b_instruct_q1": {
1098
+ "name": "Qwen3 30B Instruct (256K Context)",
1099
+ "repo_id": "unsloth/Qwen3-30B-A3B-Instruct-2507-GGUF",
1100
+ "filename": "*TQ1_0.gguf",
1101
+ "max_context": 262144,
1102
+ "supports_reasoning": False,
1103
+ "supports_toggle": False,
1104
+ "inference_settings": {
1105
+ "temperature": 0.7,
1106
+ "top_p": 0.95,
1107
+ "top_k": 30,
1108
+ "repeat_penalty": 1.0,
1109
+ },
1110
+ },
1111
+ }
1112
+
1113
+ DEFAULT_SYNTHESIS_MODEL = "qwen3_1.7b_q4"
1114
+
1115
+
1116
  def load_model(model_key: str = None, n_threads: int = 2) -> Tuple[Llama, str]:
1117
  """
1118
  Load model with CPU optimizations. Only reloads if model changes.
 
1219
  return gr.update(visible=True, value=True, interactive=True, label="Enable Reasoning Mode")
1220
 
1221
 
1222
+ # ===== ADVANCED MODE: HELPER FUNCTIONS =====
1223
+
1224
+ def get_model_config(model_key: str, model_role: str) -> Dict[str, Any]:
1225
+ """
1226
+ Get model configuration based on role.
1227
+
1228
+ Ensures same model (e.g., qwen3_1.7b_q4) uses DIFFERENT settings
1229
+ for extraction vs synthesis.
1230
+
1231
+ Args:
1232
+ model_key: Model identifier (e.g., "qwen3_1.7b_q4")
1233
+ model_role: "extraction" or "synthesis"
1234
+
1235
+ Returns:
1236
+ Model configuration dict with role-specific settings
1237
+
1238
+ Raises:
1239
+ ValueError: If model_key not available for specified role
1240
+ """
1241
+ if model_role == "extraction":
1242
+ if model_key not in EXTRACTION_MODELS:
1243
+ available = ", ".join(list(EXTRACTION_MODELS.keys())[:3]) + "..."
1244
+ raise ValueError(
1245
+ f"Model '{model_key}' not available for extraction role. "
1246
+ f"Available: {available}"
1247
+ )
1248
+ return EXTRACTION_MODELS[model_key]
1249
+
1250
+ elif model_role == "synthesis":
1251
+ if model_key not in SYNTHESIS_MODELS:
1252
+ available = ", ".join(list(SYNTHESIS_MODELS.keys())[:3]) + "..."
1253
+ raise ValueError(
1254
+ f"Model '{model_key}' not available for synthesis role. "
1255
+ f"Available: {available}"
1256
+ )
1257
+ return SYNTHESIS_MODELS[model_key]
1258
+
1259
+ else:
1260
+ raise ValueError(
1261
+ f"Unknown model role: '{model_role}'. "
1262
+ f"Must be 'extraction' or 'synthesis'"
1263
+ )
1264
+
1265
+
1266
+ def load_model_for_role(
1267
+ model_key: str,
1268
+ model_role: str,
1269
+ n_threads: int = 2,
1270
+ user_n_ctx: Optional[int] = None
1271
+ ) -> Tuple[Llama, str]:
1272
+ """
1273
+ Load model with role-specific configuration.
1274
+
1275
+ Args:
1276
+ model_key: Model identifier
1277
+ model_role: "extraction" or "synthesis"
1278
+ n_threads: CPU threads
1279
+ user_n_ctx: User-specified n_ctx (extraction only, from slider)
1280
+
1281
+ Returns:
1282
+ (loaded_model, info_message)
1283
+
1284
+ Raises:
1285
+ Exception: If model loading fails (graceful failure)
1286
+ """
1287
+ try:
1288
+ config = get_model_config(model_key, model_role)
1289
+
1290
+ # Calculate n_ctx
1291
+ if model_role == "extraction" and user_n_ctx is not None:
1292
+ n_ctx = min(user_n_ctx, config["max_context"], MAX_USABLE_CTX)
1293
+ else:
1294
+ # Synthesis or default extraction
1295
+ n_ctx = min(config.get("max_context", 8192), MAX_USABLE_CTX)
1296
+
1297
+ # Detect GPU support
1298
+ requested_ngl = int(os.environ.get("N_GPU_LAYERS", 0))
1299
+ n_gpu_layers = requested_ngl
1300
+
1301
+ if requested_ngl != 0:
1302
+ try:
1303
+ from llama_cpp import llama_supports_gpu_offload
1304
+ gpu_available = llama_supports_gpu_offload()
1305
+ if not gpu_available:
1306
+ logger.warning("GPU requested but not available. Using CPU.")
1307
+ n_gpu_layers = 0
1308
+ except Exception as e:
1309
+ logger.warning(f"Could not detect GPU: {e}. Using CPU.")
1310
+ n_gpu_layers = 0
1311
+
1312
+ # Load model
1313
+ logger.info(f"Loading {config['name']} for {model_role} role (n_ctx={n_ctx:,})")
1314
+
1315
+ llm = Llama.from_pretrained(
1316
+ repo_id=config["repo_id"],
1317
+ filename=config["filename"],
1318
+ n_ctx=n_ctx,
1319
+ n_batch=min(2048, n_ctx),
1320
+ n_threads=n_threads,
1321
+ n_threads_batch=n_threads,
1322
+ n_gpu_layers=n_gpu_layers,
1323
+ verbose=False,
1324
+ seed=1337,
1325
+ )
1326
+
1327
+ info_msg = (
1328
+ f"✅ Loaded: {config['name']} for {model_role} "
1329
+ f"(n_ctx={n_ctx:,}, threads={n_threads})"
1330
+ )
1331
+ logger.info(info_msg)
1332
+
1333
+ return llm, info_msg
1334
+
1335
+ except Exception as e:
1336
+ # Graceful failure - let user select different model
1337
+ error_msg = (
1338
+ f"❌ Failed to load {model_key} for {model_role}: {str(e)}\n\n"
1339
+ f"Please select a different model and try again."
1340
+ )
1341
+ logger.error(error_msg, exc_info=True)
1342
+ raise Exception(error_msg)
1343
+
1344
+
1345
+ def unload_model(llm: Optional[Llama], model_name: str = "model") -> None:
1346
+ """Explicitly unload model and trigger garbage collection."""
1347
+ if llm:
1348
+ logger.info(f"Unloading {model_name}")
1349
+ del llm
1350
+ gc.collect()
1351
+ time.sleep(0.5) # Allow OS to reclaim memory
1352
+
1353
+
1354
+ def build_extraction_system_prompt(
1355
+ output_language: str,
1356
+ supports_reasoning: bool,
1357
+ supports_toggle: bool,
1358
+ enable_reasoning: bool
1359
+ ) -> str:
1360
+ """
1361
+ Build extraction system prompt with optional reasoning mode.
1362
+
1363
+ Args:
1364
+ output_language: "en" or "zh-TW" (auto-detected from transcript)
1365
+ supports_reasoning: Model has reasoning capability
1366
+ supports_toggle: User can toggle reasoning on/off
1367
+ enable_reasoning: User's choice (only applies if supports_toggle=True)
1368
+
1369
+ Returns:
1370
+ System prompt string
1371
+ """
1372
+ # Determine reasoning mode
1373
+ if supports_toggle and enable_reasoning:
1374
+ # Hybrid model with reasoning enabled
1375
+ reasoning_instruction_en = """
1376
+ Use your reasoning capabilities to analyze the content before extracting.
1377
+
1378
+ Your reasoning should:
1379
+ 1. Identify key decision points and action items
1380
+ 2. Distinguish explicit decisions from general discussion
1381
+ 3. Categorize information appropriately (action vs point vs question)
1382
+
1383
+ After reasoning, output ONLY valid JSON."""
1384
+
1385
+ reasoning_instruction_zh = """
1386
+ 使用你的推理能力分析內容後再進行提取。
1387
+
1388
+ 你的推理應該:
1389
+ 1. 識別關鍵決策點和行動項目
1390
+ 2. 區分明確決策與一般討論
1391
+ 3. 適當分類資訊(行動 vs 要點 vs 問題)
1392
+
1393
+ 推理後,僅輸出 JSON。"""
1394
+ else:
1395
+ reasoning_instruction_en = ""
1396
+ reasoning_instruction_zh = ""
1397
+
1398
+ # Build full prompt
1399
+ if output_language == "zh-TW":
1400
+ return f"""你是會議分析助手。從逐字稿中提取結構化資訊。
1401
+ {reasoning_instruction_zh}
1402
+
1403
+ 僅輸出有效的 JSON,使用此精確架構:
1404
+ {{
1405
+ "action_items": ["包含負責人和截止日期的任務", ...],
1406
+ "decisions": ["包含理由的決策", ...],
1407
+ "key_points": ["重要討論要點", ...],
1408
+ "open_questions": ["未解決的問題或疑慮", ...]
1409
+ }}
1410
+
1411
+ 規則:
1412
+ - 每個項目必須是完整、獨立的句子
1413
+ - 在每個項目中包含上下文(誰、什麼、何時)
1414
+ - 如果類別沒有項目,使用空陣列 []
1415
+ - 僅輸出 JSON,無 markdown,無解釋"""
1416
+
1417
+ else: # English
1418
+ return f"""You are a meeting analysis assistant. Extract structured information from transcript.
1419
+ {reasoning_instruction_en}
1420
+
1421
+ Output ONLY valid JSON with this exact schema:
1422
+ {{
1423
+ "action_items": ["Task with owner and deadline", ...],
1424
+ "decisions": ["Decision made with rationale", ...],
1425
+ "key_points": ["Important discussion point", ...],
1426
+ "open_questions": ["Unresolved question or concern", ...]
1427
+ }}
1428
+
1429
+ Rules:
1430
+ - Each item must be a complete, standalone sentence
1431
+ - Include context (who, what, when) in each item
1432
+ - If a category has no items, use empty array []
1433
+ - Output ONLY JSON, no markdown, no explanations"""
1434
+
1435
+
1436
+ def summarize_advanced(
1437
+ transcript: str,
1438
+ extraction_model_key: str,
1439
+ embedding_model_key: str,
1440
+ synthesis_model_key: str,
1441
+ extraction_n_ctx: int,
1442
+ overlap_turns: int,
1443
+ similarity_threshold: float,
1444
+ enable_extraction_reasoning: bool,
1445
+ enable_synthesis_reasoning: bool,
1446
+ output_language: str,
1447
+ max_tokens: int,
1448
+ enable_logging: bool,
1449
+ n_threads: int = 2
1450
+ ) -> Generator[Dict[str, Any], None, None]:
1451
+ """
1452
+ Advanced 3-stage pipeline: Extraction → Deduplication → Synthesis.
1453
+
1454
+ Yields progress updates as dicts with keys:
1455
+ - stage: "extraction" | "deduplication" | "synthesis" | "complete" | "error"
1456
+ - ticker: Progress ticker text (for extraction)
1457
+ - thinking: Thinking/reasoning content
1458
+ - summary: Final summary (for synthesis/complete)
1459
+ - error: Error message (if any)
1460
+ - trace_stats: Summary statistics (on complete)
1461
+ """
1462
+ from meeting_summarizer.trace import Tracer
1463
+ from meeting_summarizer.extraction import (
1464
+ NativeTokenizer, EmbeddingModel, Window,
1465
+ stream_extract_from_window, deduplicate_items, stream_synthesize_executive_summary
1466
+ )
1467
+
1468
+ # Initialize tracer
1469
+ tracer = Tracer(enabled=enable_logging)
1470
+ tokenizer = NativeTokenizer()
1471
+
1472
+ extraction_llm = None
1473
+ embedding_model = None
1474
+ synthesis_llm = None
1475
+
1476
+ try:
1477
+ # ===== STAGE 1: EXTRACTION =====
1478
+ yield {"stage": "extraction", "ticker": "Loading extraction model...", "thinking": "", "summary": ""}
1479
+
1480
+ extraction_llm, load_msg = load_model_for_role(
1481
+ model_key=extraction_model_key,
1482
+ model_role="extraction",
1483
+ n_threads=n_threads,
1484
+ user_n_ctx=extraction_n_ctx
1485
+ )
1486
+
1487
+ yield {"stage": "extraction", "ticker": load_msg, "thinking": "", "summary": ""}
1488
+
1489
+ # Create windows from transcript (simple split by turns for now)
1490
+ # In production, this would be more sophisticated
1491
+ lines = [l.strip() for l in transcript.split('\n') if l.strip()]
1492
+
1493
+ # Simple windowing: split into chunks based on token count
1494
+ windows = []
1495
+ current_window = []
1496
+ current_tokens = 0
1497
+ window_id = 1
1498
+
1499
+ for line_num, line in enumerate(lines):
1500
+ line_tokens = tokenizer.count(line)
1501
+
1502
+ if current_tokens + line_tokens > extraction_n_ctx and current_window:
1503
+ # Create window
1504
+ window_content = '\n'.join(current_window)
1505
+ windows.append(Window(
1506
+ id=window_id,
1507
+ content=window_content,
1508
+ start_turn=line_num - len(current_window),
1509
+ end_turn=line_num - 1,
1510
+ token_count=current_tokens
1511
+ ))
1512
+ window_id += 1
1513
+
1514
+ # Start new window with overlap
1515
+ overlap_lines = current_window[-overlap_turns:] if len(current_window) >= overlap_turns else current_window
1516
+ current_window = overlap_lines + [line]
1517
+ current_tokens = sum(tokenizer.count(l) for l in current_window)
1518
+ else:
1519
+ current_window.append(line)
1520
+ current_tokens += line_tokens
1521
+
1522
+ # Add final window
1523
+ if current_window:
1524
+ window_content = '\n'.join(current_window)
1525
+ windows.append(Window(
1526
+ id=window_id,
1527
+ content=window_content,
1528
+ start_turn=len(lines) - len(current_window),
1529
+ end_turn=len(lines) - 1,
1530
+ token_count=current_tokens
1531
+ ))
1532
+
1533
+ total_windows = len(windows)
1534
+ yield {"stage": "extraction", "ticker": f"Created {total_windows} windows", "thinking": "", "summary": ""}
1535
+
1536
+ # Extract from each window
1537
+ all_items = {"action_items": [], "decisions": [], "key_points": [], "open_questions": []}
1538
+
1539
+ extraction_config = get_model_config(extraction_model_key, "extraction")
1540
+
1541
+ for window in windows:
1542
+ for ticker, thinking, partial_items, is_complete in stream_extract_from_window(
1543
+ extraction_llm=extraction_llm,
1544
+ window=window,
1545
+ window_id=window.id,
1546
+ total_windows=total_windows,
1547
+ tracer=tracer,
1548
+ tokenizer=tokenizer,
1549
+ model_config=extraction_config,
1550
+ enable_reasoning=enable_extraction_reasoning
1551
+ ):
1552
+ yield {"stage": "extraction", "ticker": ticker, "thinking": thinking, "summary": ""}
1553
+
1554
+ if is_complete:
1555
+ # Merge items
1556
+ for category, items in partial_items.items():
1557
+ all_items[category].extend(items)
1558
+
1559
+ # Unload extraction model
1560
+ unload_model(extraction_llm, "extraction model")
1561
+ extraction_llm = None
1562
+
1563
+ total_extracted = sum(len(v) for v in all_items.values())
1564
+ yield {"stage": "extraction", "ticker": f"✅ Extracted {total_extracted} total items", "thinking": "", "summary": ""}
1565
+
1566
+ # ===== STAGE 2: DEDUPLICATION =====
1567
+ yield {"stage": "deduplication", "ticker": "Loading embedding model...", "thinking": "", "summary": ""}
1568
+
1569
+ embedding_model = EmbeddingModel(embedding_model_key, n_threads=n_threads)
1570
+ load_msg = embedding_model.load()
1571
+
1572
+ yield {"stage": "deduplication", "ticker": load_msg, "thinking": "", "summary": ""}
1573
+
1574
+ # Deduplicate
1575
+ deduplicated_items = deduplicate_items(
1576
+ all_items=all_items,
1577
+ embedding_model=embedding_model,
1578
+ similarity_threshold=similarity_threshold,
1579
+ tracer=tracer
1580
+ )
1581
+
1582
+ # Unload embedding model
1583
+ embedding_model.unload()
1584
+ embedding_model = None
1585
+
1586
+ total_deduplicated = sum(len(v) for v in deduplicated_items.values())
1587
+ duplicates_removed = total_extracted - total_deduplicated
1588
+
1589
+ yield {
1590
+ "stage": "deduplication",
1591
+ "ticker": f"✅ Deduplication complete: {total_extracted} → {total_deduplicated} ({duplicates_removed} duplicates removed)",
1592
+ "thinking": "",
1593
+ "summary": ""
1594
+ }
1595
+
1596
+ # ===== STAGE 3: SYNTHESIS =====
1597
+ yield {"stage": "synthesis", "ticker": "", "thinking": "", "summary": "Loading synthesis model..."}
1598
+
1599
+ synthesis_llm, load_msg = load_model_for_role(
1600
+ model_key=synthesis_model_key,
1601
+ model_role="synthesis",
1602
+ n_threads=n_threads
1603
+ )
1604
+
1605
+ yield {"stage": "synthesis", "ticker": "", "thinking": "", "summary": load_msg}
1606
+
1607
+ # Synthesize
1608
+ synthesis_config = get_model_config(synthesis_model_key, "synthesis")
1609
+ final_summary = ""
1610
+ final_thinking = ""
1611
+
1612
+ for summary_chunk, thinking_chunk, is_complete in stream_synthesize_executive_summary(
1613
+ synthesis_llm=synthesis_llm,
1614
+ deduplicated_items=deduplicated_items,
1615
+ model_config=synthesis_config,
1616
+ output_language=output_language,
1617
+ enable_reasoning=enable_synthesis_reasoning,
1618
+ max_tokens=max_tokens,
1619
+ tracer=tracer
1620
+ ):
1621
+ final_summary = summary_chunk
1622
+ final_thinking = thinking_chunk
1623
+ yield {"stage": "synthesis", "ticker": "", "thinking": thinking_chunk, "summary": summary_chunk}
1624
+
1625
+ # Unload synthesis model
1626
+ unload_model(synthesis_llm, "synthesis model")
1627
+ synthesis_llm = None
1628
+
1629
+ # Apply Chinese conversion if needed
1630
+ if output_language == "zh-TW":
1631
+ converter = OpenCC('s2twp')
1632
+ final_summary = converter.convert(final_summary)
1633
+ if final_thinking:
1634
+ final_thinking = converter.convert(final_thinking)
1635
+
1636
+ # Get trace stats
1637
+ trace_stats = tracer.get_summary_stats()
1638
+
1639
+ yield {
1640
+ "stage": "complete",
1641
+ "ticker": "",
1642
+ "thinking": final_thinking,
1643
+ "summary": final_summary,
1644
+ "trace_stats": trace_stats,
1645
+ "trace_json": tracer.get_trace_json()
1646
+ }
1647
+
1648
+ except Exception as e:
1649
+ logger.error(f"Advanced pipeline error: {e}", exc_info=True)
1650
+
1651
+ # Cleanup
1652
+ if extraction_llm:
1653
+ unload_model(extraction_llm, "extraction model")
1654
+ if embedding_model:
1655
+ embedding_model.unload()
1656
+ if synthesis_llm:
1657
+ unload_model(synthesis_llm, "synthesis model")
1658
+
1659
+ yield {
1660
+ "stage": "error",
1661
+ "ticker": "",
1662
+ "thinking": "",
1663
+ "summary": "",
1664
+ "error": str(e)
1665
+ }
1666
+
1667
+
1668
  def download_summary_json(summary, thinking, model_key, language, metrics):
1669
  """Generate JSON file with summary and metadata."""
1670
  import json
 
2566
  )
2567
 
2568
  # ==========================================
2569
+ # Section 2: Mode Selection (Standard vs Advanced)
2570
  # ==========================================
2571
+ with gr.Tabs() as mode_tabs:
2572
 
2573
+ # ===== STANDARD MODE =====
2574
+ with gr.TabItem("📊 Standard Mode"):
2575
+ gr.HTML('<div style="font-size: 0.9em; color: #64748b; margin-bottom: 10px;">Single-model direct summarization</div>')
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2576
 
2577
+ with gr.Tabs() as model_tabs:
2578
+
2579
+ # --- Tab 1: Preset Models ---
2580
+ with gr.TabItem("🤖 Preset Models"):
2581
+ # Filter out custom_hf from preset choices
2582
+ preset_choices = [
2583
+ (info["name"] + (" ⚡" if info.get("supports_reasoning", False) and not info.get("supports_toggle", False) else ""), key)
2584
+ for key, info in AVAILABLE_MODELS.items()
2585
+ if key != "custom_hf"
2586
+ ]
2587
+
2588
+ model_dropdown = gr.Dropdown(
2589
+ choices=preset_choices,
2590
+ value=DEFAULT_MODEL_KEY,
2591
+ label="Select Model",
2592
+ info="Smaller = faster. ⚡ = Always-reasoning models."
2593
+ )
2594
+
2595
+ enable_reasoning = gr.Checkbox(
2596
+ value=True,
2597
+ label="Enable Reasoning Mode",
2598
+ info="Uses /think for deeper analysis (slower) or /no_think for direct output (faster).",
2599
+ interactive=True,
2600
+ visible=AVAILABLE_MODELS[DEFAULT_MODEL_KEY].get("supports_toggle", False)
2601
+ )
2602
+
2603
+ # Model info for preset models
2604
+ gr.HTML('<div class="section-header" style="margin-top: 12px;"><span class="section-icon">📊</span> Model Information</div>')
2605
+ _default_threads = DEFAULT_CUSTOM_THREADS if DEFAULT_CUSTOM_THREADS > 0 else 2
2606
+ info_output = gr.Markdown(
2607
+ value=get_model_info(DEFAULT_MODEL_KEY, n_threads=_default_threads)[0],
2608
+ elem_classes=["stats-grid"]
2609
+ )
2610
+
2611
+ # --- Tab 2: Custom GGUF ---
2612
+ with gr.TabItem("🔧 Custom GGUF"):
2613
+ gr.HTML('<div style="font-size: 0.85em; color: #64748b; margin-bottom: 10px;">Load any GGUF model from HuggingFace Hub</div>')
2614
+
2615
+ # HF Hub Search Component
2616
+ model_search_input = HuggingfaceHubSearch(
2617
+ label="🔍 Search HuggingFace Models",
2618
+ placeholder="Type model name (e.g., 'qwen', 'phi', 'llama')",
2619
+ search_type="model",
2620
+ )
2621
+
2622
+ # File dropdown (populated after repo discovery)
2623
+ custom_file_dropdown = gr.Dropdown(
2624
+ label="📦 Select GGUF File",
2625
+ choices=[],
2626
+ value=None,
2627
+ info="GGUF files appear after selecting a model above",
2628
+ interactive=True,
2629
+ )
2630
+
2631
+ # Load button
2632
+ load_btn = gr.Button("⬇️ Load Selected Model", variant="primary", size="sm")
2633
+
2634
+ # Status message
2635
+ custom_status = gr.Textbox(
2636
+ label="Status",
2637
+ interactive=False,
2638
+ value="",
2639
+ visible=False,
2640
+ )
2641
+
2642
+ retry_btn = gr.Button("🔄 Retry", variant="secondary", visible=False)
2643
+
2644
+ # Model info for custom models (shows after loading)
2645
+ gr.HTML('<div class="section-header" style="margin-top: 12px;"><span class="section-icon">📊</span> Custom Model Info</div>')
2646
+ custom_info_output = gr.Markdown(
2647
+ value="*Load a model to see its specifications...*",
2648
+ elem_classes=["stats-grid"]
2649
+ )
2650
 
2651
+ # ===== ADVANCED MODE =====
2652
+ with gr.TabItem("🧠 Advanced Mode (3-Model Pipeline)"):
2653
+ gr.HTML('<div style="font-size: 0.9em; color: #64748b; margin-bottom: 10px;">Extraction Deduplication Synthesis</div>')
2654
 
2655
+ # Model Selection Row
2656
+ gr.HTML('<div class="section-header"><span class="section-icon">🤖</span> Model Selection</div>')
2657
+ with gr.Row():
2658
+ extraction_model = gr.Dropdown(
2659
+ choices=[(EXTRACTION_MODELS[k]["name"], k) for k in EXTRACTION_MODELS.keys()],
2660
+ value=DEFAULT_EXTRACTION_MODEL,
2661
+ label="🔍 Stage 1: Extraction Model (≤1.7B)",
2662
+ info="Extracts structured items from windows"
2663
+ )
2664
+
2665
+ embedding_model = gr.Dropdown(
2666
+ choices=[("granite-107m", "granite-107m"), ("granite-278m", "granite-278m"),
2667
+ ("gemma-300m", "gemma-300m"), ("qwen-600m", "qwen-600m")],
2668
+ value="granite-107m",
2669
+ label="🧬 Stage 2: Embedding Model",
2670
+ info="Deduplication via semantic similarity"
2671
+ )
2672
+
2673
+ synthesis_model = gr.Dropdown(
2674
+ choices=[(SYNTHESIS_MODELS[k]["name"], k) for k in SYNTHESIS_MODELS.keys()],
2675
+ value=DEFAULT_SYNTHESIS_MODEL,
2676
+ label="✨ Stage 3: Synthesis Model (1B-30B)",
2677
+ info="Generates executive summary"
2678
+ )
2679
 
2680
+ # Extraction Parameters Row
2681
+ gr.HTML('<div class="section-header" style="margin-top: 12px;"><span class="section-icon">⚙️</span> Extraction Parameters</div>')
2682
+ with gr.Row():
2683
+ extraction_n_ctx = gr.Slider(
2684
+ minimum=2048,
2685
+ maximum=8192,
2686
+ step=1024,
2687
+ value=4096,
2688
+ label="🪟 Extraction Context Window (n_ctx)",
2689
+ info="Smaller = more windows, Larger = fewer windows"
2690
+ )
2691
+
2692
+ overlap_turns = gr.Slider(
2693
+ minimum=1,
2694
+ maximum=5,
2695
+ step=1,
2696
+ value=2,
2697
+ label="🔄 Window Overlap (turns)",
2698
+ info="Speaker turns shared between windows"
2699
+ )
2700
 
2701
+ # Deduplication Parameters Row
2702
+ with gr.Row():
2703
+ similarity_threshold = gr.Slider(
2704
+ minimum=0.70,
2705
+ maximum=0.95,
2706
+ step=0.01,
2707
+ value=0.85,
2708
+ label="🎯 Deduplication Similarity Threshold",
2709
+ info="Higher = stricter duplicate detection"
2710
+ )
2711
 
2712
+ # Reasoning Controls (Separate checkboxes)
2713
+ gr.HTML('<div class="section-header" style="margin-top: 12px;"><span class="section-icon">🧠</span> Reasoning Configuration</div>')
2714
+ with gr.Row():
2715
+ enable_extraction_reasoning = gr.Checkbox(
2716
+ value=False,
2717
+ visible=False,
2718
+ label="🧠 Enable Reasoning for Extraction",
2719
+ info="Thinking before JSON (Qwen3 hybrid models only)"
2720
+ )
2721
+
2722
+ enable_synthesis_reasoning = gr.Checkbox(
2723
+ value=True,
2724
+ visible=True,
2725
+ label="🧠 Enable Reasoning for Synthesis",
2726
+ info="Thinking for final summary generation"
2727
+ )
2728
 
2729
+ # Output Settings Row
2730
+ gr.HTML('<div class="section-header" style="margin-top: 12px;"><span class="section-icon">🌐</span> Output Settings</div>')
2731
+ with gr.Row():
2732
+ adv_output_language = gr.Radio(
2733
+ choices=["en", "zh-TW"],
2734
+ value="en",
2735
+ label="Output Language",
2736
+ info="Extraction auto-detects, synthesis uses this"
2737
+ )
2738
+
2739
+ adv_max_tokens = gr.Slider(
2740
+ minimum=512,
2741
+ maximum=4096,
2742
+ step=128,
2743
+ value=2048,
2744
+ label="📏 Max Synthesis Tokens",
2745
+ info="Maximum tokens for final summary"
2746
+ )
2747
 
2748
+ # Logging Control
2749
+ enable_detailed_logging = gr.Checkbox(
2750
+ value=True,
2751
+ label="📝 Enable Detailed Trace Logging",
2752
+ info="Save JSONL trace (embedded in download JSON)"
2753
  )
2754
 
2755
  # ==========================================
 
2982
  outputs=[system_prompt_debug],
2983
  )
2984
 
2985
+ # ===== ADVANCED MODE EVENT HANDLERS =====
2986
+
2987
+ # Update extraction reasoning checkbox visibility when extraction model changes
2988
+ def update_extraction_reasoning_visibility(model_key):
2989
+ """Show/hide extraction reasoning checkbox based on model capabilities."""
2990
+ if model_key not in EXTRACTION_MODELS:
2991
+ return gr.update(visible=False, value=False)
2992
+
2993
+ config = EXTRACTION_MODELS[model_key]
2994
+ supports_toggle = config.get("supports_toggle", False)
2995
+
2996
+ if supports_toggle:
2997
+ # Hybrid model
2998
+ return gr.update(visible=True, value=False, interactive=True, label="🧠 Enable Reasoning for Extraction")
2999
+ elif config.get("supports_reasoning", False):
3000
+ # Thinking-only model (none currently in extraction)
3001
+ return gr.update(visible=True, value=True, interactive=False, label="🧠 Reasoning Mode (Always On)")
3002
+ else:
3003
+ # Non-reasoning model
3004
+ return gr.update(visible=False, value=False)
3005
+
3006
+ # Update synthesis reasoning checkbox visibility when synthesis model changes
3007
+ def update_synthesis_reasoning_visibility(model_key):
3008
+ """Show/hide synthesis reasoning checkbox based on model capabilities."""
3009
+ if model_key not in SYNTHESIS_MODELS:
3010
+ return gr.update(visible=False, value=False)
3011
+
3012
+ config = SYNTHESIS_MODELS[model_key]
3013
+ supports_reasoning = config.get("supports_reasoning", False)
3014
+ supports_toggle = config.get("supports_toggle", False)
3015
+
3016
+ if not supports_reasoning:
3017
+ # Non-reasoning model
3018
+ return gr.update(visible=False, value=False)
3019
+ elif supports_reasoning and not supports_toggle:
3020
+ # Thinking-only model
3021
+ return gr.update(visible=True, value=True, interactive=False, label="⚡ Reasoning Mode (Always On)")
3022
+ else:
3023
+ # Hybrid model
3024
+ return gr.update(visible=True, value=True, interactive=True, label="🧠 Enable Reasoning for Synthesis")
3025
+
3026
+ # Wire up Advanced Mode event handlers
3027
+ extraction_model.change(
3028
+ fn=update_extraction_reasoning_visibility,
3029
+ inputs=[extraction_model],
3030
+ outputs=[enable_extraction_reasoning]
3031
+ )
3032
+
3033
+ synthesis_model.change(
3034
+ fn=update_synthesis_reasoning_visibility,
3035
+ inputs=[synthesis_model],
3036
+ outputs=[enable_synthesis_reasoning]
3037
+ )
3038
+
3039
  # Debounced auto-discovery for custom repo ID (500ms delay)
3040
  import time as time_module
3041
 
 
3201
  outputs=[custom_info_output],
3202
  )
3203
 
3204
+ # ===== SUBMIT BUTTON ROUTER =====
3205
+ # Routes to Standard or Advanced mode based on active tab
3206
+
3207
+ def route_summarize(
3208
+ # Standard mode inputs
3209
+ file_input_val, text_input_val, model_dropdown_val, enable_reasoning_val,
3210
+ max_tokens_val, temperature_val, top_p_val, top_k_val, language_val,
3211
+ thread_config_val, custom_threads_val, custom_model_val,
3212
+ # Advanced mode inputs
3213
+ extraction_model_val, embedding_model_val, synthesis_model_val,
3214
+ extraction_n_ctx_val, overlap_turns_val, similarity_threshold_val,
3215
+ enable_extraction_reasoning_val, enable_synthesis_reasoning_val,
3216
+ adv_output_language_val, adv_max_tokens_val, enable_logging_val,
3217
+ # Mode selector
3218
+ mode_tabs_val
3219
+ ):
3220
+ """Route to Standard or Advanced mode based on selected tab."""
3221
+
3222
+ # Determine active mode (Gradio returns index of active tab)
3223
+ # 0 = Standard Mode, 1 = Advanced Mode
3224
+ is_advanced_mode = (mode_tabs_val == 1)
3225
+
3226
+ if is_advanced_mode:
3227
+ # Advanced Mode: Use summarize_advanced()
3228
+ # Get n_threads
3229
+ thread_map = {"free": 2, "upgrade": 8, "custom": max(1, custom_threads_val)}
3230
+ n_threads = thread_map.get(thread_config_val, 2)
3231
+
3232
+ # Get transcript
3233
+ transcript = ""
3234
+ if file_input_val:
3235
+ with open(file_input_val, 'r', encoding='utf-8') as f:
3236
+ transcript = f.read()
3237
+ elif text_input_val:
3238
+ transcript = text_input_val
3239
+ else:
3240
+ yield ("", "⚠️ Please upload a file or paste text", "", {}, "")
3241
+ return
3242
+
3243
+ # Stream Advanced Mode pipeline
3244
+ for update in summarize_advanced(
3245
+ transcript=transcript,
3246
+ extraction_model_key=extraction_model_val,
3247
+ embedding_model_key=embedding_model_val,
3248
+ synthesis_model_key=synthesis_model_val,
3249
+ extraction_n_ctx=extraction_n_ctx_val,
3250
+ overlap_turns=overlap_turns_val,
3251
+ similarity_threshold=similarity_threshold_val,
3252
+ enable_extraction_reasoning=enable_extraction_reasoning_val,
3253
+ enable_synthesis_reasoning=enable_synthesis_reasoning_val,
3254
+ output_language=adv_output_language_val,
3255
+ max_tokens=adv_max_tokens_val,
3256
+ enable_logging=enable_logging_val,
3257
+ n_threads=n_threads
3258
+ ):
3259
+ stage = update.get("stage", "")
3260
+
3261
+ if stage == "extraction":
3262
+ ticker = update.get("ticker", "")
3263
+ thinking = update.get("thinking", "")
3264
+ yield (thinking, ticker, "", {}, "")
3265
+
3266
+ elif stage == "deduplication":
3267
+ ticker = update.get("ticker", "")
3268
+ yield ("", ticker, "", {}, "")
3269
+
3270
+ elif stage == "synthesis":
3271
+ thinking = update.get("thinking", "")
3272
+ summary = update.get("summary", "")
3273
+ yield (thinking, summary, "", {}, "")
3274
+
3275
+ elif stage == "complete":
3276
+ thinking = update.get("thinking", "")
3277
+ summary = update.get("summary", "")
3278
+ trace_stats = update.get("trace_stats", {})
3279
+
3280
+ # Format info message
3281
+ info_msg = f"""**Advanced Mode Complete**
3282
+ - Total Windows: {trace_stats.get('total_windows', 0)}
3283
+ - Items Extracted: {trace_stats.get('total_items_extracted', 0)}
3284
+ - Items After Dedup: {trace_stats.get('total_items_after_dedup', 0)}
3285
+ - Duplicates Removed: {trace_stats.get('total_duplicates_removed', 0)}
3286
+ - Total Time: {trace_stats.get('total_elapsed_seconds', 0):.1f}s"""
3287
+
3288
+ # Store trace for download
3289
+ metrics = {
3290
+ "mode": "advanced",
3291
+ "trace_stats": trace_stats,
3292
+ "trace_json": update.get("trace_json", [])
3293
+ }
3294
+
3295
+ yield (thinking, summary, info_msg, metrics, "Advanced Mode (3-Model Pipeline)")
3296
+
3297
+ elif stage == "error":
3298
+ error = update.get("error", "Unknown error")
3299
+ yield ("", f"❌ Error: {error}", "", {}, "")
3300
+ return
3301
+
3302
+ else:
3303
+ # Standard Mode: Use existing summarize_streaming()
3304
+ for thinking, summary, info, metrics, system_prompt in summarize_streaming(
3305
+ file_input_val, text_input_val, model_dropdown_val, enable_reasoning_val,
3306
+ max_tokens_val, temperature_val, top_p_val, top_k_val, language_val,
3307
+ thread_config_val, custom_threads_val, custom_model_val
3308
+ ):
3309
+ yield (thinking, summary, info, metrics, system_prompt)
3310
+
3311
+ # Wire up submit button with router
3312
  submit_btn.click(
3313
+ fn=route_summarize,
3314
+ inputs=[
3315
+ # Standard mode inputs
3316
+ file_input, text_input, model_dropdown, enable_reasoning,
3317
+ max_tokens, temperature_slider, top_p, top_k, language_selector,
3318
+ thread_config_dropdown, custom_threads_slider, custom_model_state,
3319
+ # Advanced mode inputs
3320
+ extraction_model, embedding_model, synthesis_model,
3321
+ extraction_n_ctx, overlap_turns, similarity_threshold,
3322
+ enable_extraction_reasoning, enable_synthesis_reasoning,
3323
+ adv_output_language, adv_max_tokens, enable_detailed_logging,
3324
+ # Mode selector
3325
+ mode_tabs
3326
+ ],
3327
  outputs=[thinking_output, summary_output, info_output, metrics_state, system_prompt_debug],
3328
  show_progress="full"
3329
  )
meeting_summarizer/__init__.py ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Tiny Scribe - Meeting Summarizer Module
3
+
4
+ This module provides advanced 3-stage meeting summarization:
5
+ 1. Extraction: Extract structured items from transcript windows
6
+ 2. Deduplication: Remove semantic duplicates using embeddings
7
+ 3. Synthesis: Generate executive summary from deduplicated items
8
+ """
9
+
10
+ __version__ = "1.0.0"
11
+
12
+ # Package exports will be added as we implement components
13
+ __all__ = []
meeting_summarizer/extraction.py ADDED
@@ -0,0 +1,705 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Advanced Extraction Pipeline
3
+
4
+ Provides:
5
+ 1. EMBEDDING_MODELS registry (4 models for deduplication)
6
+ 2. NativeTokenizer - Count tokens without llama.cpp
7
+ 3. EmbeddingModel - Load/compute embeddings
8
+ 4. format_progress_ticker - Live UI updates
9
+ 5. stream_extract_from_window - Stage 1: Extraction
10
+ 6. deduplicate_items - Stage 2: Deduplication
11
+ 7. stream_synthesize_executive_summary - Stage 3: Synthesis
12
+ """
13
+
14
+ import re
15
+ import json
16
+ import time
17
+ import logging
18
+ from typing import Dict, List, Any, Tuple, Generator, Optional
19
+ from dataclasses import dataclass
20
+ import numpy as np
21
+ from llama_cpp import Llama
22
+
23
+ logger = logging.getLogger(__name__)
24
+
25
+
26
+ # ===== EMBEDDING MODELS REGISTRY =====
27
+
28
+ EMBEDDING_MODELS = {
29
+ "granite-107m": {
30
+ "name": "Granite 107M Multilingual (384-dim)",
31
+ "repo_id": "ibm-granite/granite-embedding-107m-multilingual",
32
+ "filename": "*Q8_0.gguf",
33
+ "embedding_dim": 384,
34
+ "max_context": 2048,
35
+ "description": "Fastest, multilingual, good for quick deduplication",
36
+ },
37
+ "granite-278m": {
38
+ "name": "Granite 278M Multilingual (768-dim)",
39
+ "repo_id": "ibm-granite/granite-embedding-278m-multilingual",
40
+ "filename": "*Q8_0.gguf",
41
+ "embedding_dim": 768,
42
+ "max_context": 2048,
43
+ "description": "Balanced speed/quality, multilingual",
44
+ },
45
+ "gemma-300m": {
46
+ "name": "Embedding Gemma 300M (768-dim)",
47
+ "repo_id": "unsloth/embeddinggemma-300m-GGUF",
48
+ "filename": "*Q8_0.gguf",
49
+ "embedding_dim": 768,
50
+ "max_context": 2048,
51
+ "description": "Google embedding model, strong semantics",
52
+ },
53
+ "qwen-600m": {
54
+ "name": "Qwen3 Embedding 600M (1024-dim)",
55
+ "repo_id": "Qwen/Qwen3-Embedding-0.6B-GGUF",
56
+ "filename": "*Q8_0.gguf",
57
+ "embedding_dim": 1024,
58
+ "max_context": 2048,
59
+ "description": "Highest quality, best for critical dedup",
60
+ },
61
+ }
62
+
63
+
64
+ # ===== NATIVE TOKENIZER =====
65
+
66
+ class NativeTokenizer:
67
+ """
68
+ Simple tokenizer for counting tokens without llama.cpp.
69
+ Uses GPT-2 style approximation: ~1 token per 4 characters.
70
+ """
71
+
72
+ def __init__(self):
73
+ """Initialize tokenizer."""
74
+ self.chars_per_token = 4 # Conservative estimate
75
+
76
+ def count(self, text: str) -> int:
77
+ """
78
+ Count tokens in text.
79
+
80
+ Args:
81
+ text: Input text
82
+
83
+ Returns:
84
+ Approximate token count
85
+ """
86
+ if not text:
87
+ return 0
88
+
89
+ # Simple heuristic: 1 token ≈ 4 characters for English
90
+ # Adjust for CJK characters (Chinese/Japanese/Korean)
91
+ cjk_chars = len(re.findall(r'[\u4e00-\u9fff\u3040-\u309f\u30a0-\u30ff]', text))
92
+ non_cjk_chars = len(text) - cjk_chars
93
+
94
+ # CJK: 1 char ≈ 1 token, Non-CJK: 4 chars ≈ 1 token
95
+ tokens = cjk_chars + (non_cjk_chars // self.chars_per_token)
96
+
97
+ return max(1, tokens) # Minimum 1 token
98
+
99
+
100
+ # ===== EMBEDDING MODEL =====
101
+
102
+ class EmbeddingModel:
103
+ """Wrapper for embedding models used in deduplication."""
104
+
105
+ def __init__(self, model_key: str, n_threads: int = 2):
106
+ """
107
+ Initialize embedding model.
108
+
109
+ Args:
110
+ model_key: Key from EMBEDDING_MODELS registry
111
+ n_threads: CPU threads for inference
112
+ """
113
+ if model_key not in EMBEDDING_MODELS:
114
+ raise ValueError(f"Unknown embedding model: {model_key}")
115
+
116
+ self.model_key = model_key
117
+ self.config = EMBEDDING_MODELS[model_key]
118
+ self.n_threads = n_threads
119
+ self.llm: Optional[Llama] = None
120
+
121
+ def load(self) -> str:
122
+ """
123
+ Load embedding model.
124
+
125
+ Returns:
126
+ Info message
127
+ """
128
+ logger.info(f"Loading embedding model: {self.config['name']}")
129
+
130
+ try:
131
+ self.llm = Llama.from_pretrained(
132
+ repo_id=self.config["repo_id"],
133
+ filename=self.config["filename"],
134
+ n_ctx=self.config["max_context"],
135
+ n_batch=512,
136
+ n_threads=self.n_threads,
137
+ n_threads_batch=self.n_threads,
138
+ n_gpu_layers=0, # CPU only for embeddings
139
+ verbose=False,
140
+ embedding=True, # Enable embedding mode
141
+ )
142
+
143
+ msg = f"✅ Loaded: {self.config['name']} ({self.config['embedding_dim']}-dim)"
144
+ logger.info(msg)
145
+ return msg
146
+
147
+ except Exception as e:
148
+ error_msg = f"❌ Failed to load {self.model_key}: {str(e)}"
149
+ logger.error(error_msg, exc_info=True)
150
+ raise Exception(error_msg)
151
+
152
+ def embed(self, text: str) -> np.ndarray:
153
+ """
154
+ Compute embedding for text.
155
+
156
+ Args:
157
+ text: Input text
158
+
159
+ Returns:
160
+ Embedding vector (numpy array)
161
+ """
162
+ if self.llm is None:
163
+ raise RuntimeError("Model not loaded. Call load() first.")
164
+
165
+ # Truncate text to max context
166
+ # Rough approximation: 1 token ≈ 4 chars
167
+ max_chars = self.config["max_context"] * 4
168
+ if len(text) > max_chars:
169
+ text = text[:max_chars]
170
+
171
+ # Get embedding
172
+ embedding = self.llm.embed(text)
173
+
174
+ # Normalize vector
175
+ norm = np.linalg.norm(embedding)
176
+ if norm > 0:
177
+ embedding = embedding / norm
178
+
179
+ return embedding
180
+
181
+ def unload(self) -> None:
182
+ """Unload model and free memory."""
183
+ if self.llm:
184
+ logger.info(f"Unloading embedding model: {self.config['name']}")
185
+ del self.llm
186
+ self.llm = None
187
+
188
+ import gc
189
+ gc.collect()
190
+ time.sleep(0.5)
191
+
192
+
193
+ # ===== HELPER FUNCTIONS =====
194
+
195
+ @dataclass
196
+ class Window:
197
+ """Represents a transcript window for extraction."""
198
+ id: int
199
+ content: str
200
+ start_turn: int
201
+ end_turn: int
202
+ token_count: int
203
+
204
+
205
+ def format_progress_ticker(
206
+ current_window: int,
207
+ total_windows: int,
208
+ window_tokens: int,
209
+ max_tokens: int,
210
+ items_found: Dict[str, int],
211
+ tokens_per_sec: float,
212
+ eta_seconds: int,
213
+ current_snippet: str
214
+ ) -> str:
215
+ """
216
+ Format progress ticker for extraction UI.
217
+
218
+ Args:
219
+ current_window: Current window number (1-indexed)
220
+ total_windows: Total number of windows
221
+ window_tokens: Tokens in current window
222
+ max_tokens: Maximum tokens (for percentage)
223
+ items_found: Dict of {category: count}
224
+ tokens_per_sec: Generation speed
225
+ eta_seconds: Estimated time to completion
226
+ current_snippet: Last extracted item (truncated)
227
+
228
+ Returns:
229
+ Formatted ticker string
230
+ """
231
+ # Progress bar
232
+ progress_pct = (current_window / total_windows) * 100
233
+ bar_width = 20
234
+ filled = int(bar_width * progress_pct / 100)
235
+ bar = "█" * filled + "░" * (bar_width - filled)
236
+
237
+ # Item counts
238
+ action_items = items_found.get("action_items", 0)
239
+ decisions = items_found.get("decisions", 0)
240
+ key_points = items_found.get("key_points", 0)
241
+ questions = items_found.get("open_questions", 0)
242
+ total_items = action_items + decisions + key_points + questions
243
+
244
+ # ETA formatting
245
+ if eta_seconds > 60:
246
+ eta_str = f"{eta_seconds // 60}m {eta_seconds % 60}s"
247
+ else:
248
+ eta_str = f"{eta_seconds}s"
249
+
250
+ # Truncate snippet
251
+ snippet = current_snippet[:60] + "..." if len(current_snippet) > 60 else current_snippet
252
+
253
+ ticker = f"""
254
+ 🪟 Window {current_window}/{total_windows} | {bar} {progress_pct:.0f}%
255
+
256
+ 📊 Extracted: {total_items} items
257
+ ✓ Actions: {action_items} | Decisions: {decisions} | Points: {key_points} | Questions: {questions}
258
+
259
+ ⚡ Speed: {tokens_per_sec:.1f} tok/s | ETA: {eta_str}
260
+ 📝 Latest: {snippet}
261
+ """
262
+
263
+ return ticker.strip()
264
+
265
+
266
+ def cosine_similarity(vec1: np.ndarray, vec2: np.ndarray) -> float:
267
+ """
268
+ Compute cosine similarity between two vectors.
269
+
270
+ Args:
271
+ vec1: First vector (normalized)
272
+ vec2: Second vector (normalized)
273
+
274
+ Returns:
275
+ Cosine similarity (0.0 to 1.0)
276
+ """
277
+ # Vectors should already be normalized, but ensure it
278
+ dot_product = np.dot(vec1, vec2)
279
+ return float(dot_product)
280
+
281
+
282
+ # ===== HELPER FUNCTIONS =====
283
+
284
+ def _try_parse_extraction_json(text: str) -> Optional[Dict[str, List[str]]]:
285
+ """
286
+ Attempt to parse extraction JSON from LLM output.
287
+
288
+ Args:
289
+ text: Raw LLM output
290
+
291
+ Returns:
292
+ Parsed dict or None if invalid
293
+ """
294
+ # Remove markdown code blocks
295
+ text = re.sub(r'```json\s*', '', text)
296
+ text = re.sub(r'```\s*$', '', text)
297
+ text = text.strip()
298
+
299
+ try:
300
+ data = json.loads(text)
301
+
302
+ # Validate schema
303
+ required_keys = {"action_items", "decisions", "key_points", "open_questions"}
304
+ if not isinstance(data, dict) or not required_keys.issubset(data.keys()):
305
+ return None
306
+
307
+ # Validate all values are lists
308
+ for key in required_keys:
309
+ if not isinstance(data[key], list):
310
+ return None
311
+
312
+ return data
313
+
314
+ except json.JSONDecodeError:
315
+ return None
316
+
317
+
318
+ def _sample_llm_response(text: str, max_chars: int = 400) -> str:
319
+ """Sample LLM response for trace logging."""
320
+ if not text:
321
+ return ""
322
+ return text[:max_chars] if len(text) > max_chars else text
323
+
324
+
325
+ # ===== CORE PIPELINE FUNCTIONS =====
326
+
327
+ def stream_extract_from_window(
328
+ extraction_llm: Llama,
329
+ window: Window,
330
+ window_id: int,
331
+ total_windows: int,
332
+ tracer: Any,
333
+ tokenizer: NativeTokenizer,
334
+ model_config: Dict[str, Any],
335
+ enable_reasoning: bool = False
336
+ ) -> Generator[Tuple[str, str, Dict[str, List[str]], bool], None, None]:
337
+ """
338
+ Stream extraction from single window with live progress + optional reasoning.
339
+
340
+ Yields:
341
+ (ticker_text, thinking_text, partial_items, is_complete)
342
+ - ticker_text: Progress ticker for UI
343
+ - thinking_text: Reasoning/thinking blocks (if model supports)
344
+ - partial_items: Current extracted items
345
+ - is_complete: True on final yield
346
+ """
347
+ # Auto-detect language from window content
348
+ has_cjk = bool(re.search(r'[\u4e00-\u9fff]', window.content))
349
+ output_language = "zh-TW" if has_cjk else "en"
350
+
351
+ # Build system prompt
352
+ from meeting_summarizer.trace import Tracer # Avoid circular import
353
+
354
+ supports_reasoning = model_config.get("supports_reasoning", False)
355
+ supports_toggle = model_config.get("supports_toggle", False)
356
+
357
+ # Build system prompt (reuse function from app.py via import)
358
+ if output_language == "zh-TW":
359
+ reasoning_inst = "使用推理能力分析後提取。" if (supports_toggle and enable_reasoning) else ""
360
+ system_prompt = f"""你是會議分析助手。{reasoning_inst}
361
+
362
+ 僅輸出 JSON:
363
+ {{
364
+ "action_items": ["任務", ...],
365
+ "decisions": ["決策", ...],
366
+ "key_points": ["要點", ...],
367
+ "open_questions": ["問題", ...]
368
+ }}"""
369
+ else:
370
+ reasoning_inst = "Use reasoning before extracting." if (supports_toggle and enable_reasoning) else ""
371
+ system_prompt = f"""You are a meeting assistant. {reasoning_inst}
372
+
373
+ Output ONLY JSON:
374
+ {{
375
+ "action_items": ["Task", ...],
376
+ "decisions": ["Decision", ...],
377
+ "key_points": ["Point", ...],
378
+ "open_questions": ["Question", ...]
379
+ }}"""
380
+
381
+ user_prompt = f"Transcript:\n\n{window.content}"
382
+
383
+ messages = [
384
+ {"role": "system", "content": system_prompt},
385
+ {"role": "user", "content": user_prompt}
386
+ ]
387
+
388
+ # Stream extraction
389
+ full_response = ""
390
+ thinking_content = ""
391
+ start_time = time.time()
392
+ first_token_time = None
393
+ token_count = 0
394
+
395
+ try:
396
+ settings = model_config["inference_settings"]
397
+ stream = extraction_llm.create_chat_completion(
398
+ messages=messages,
399
+ max_tokens=1024,
400
+ temperature=settings["temperature"],
401
+ top_p=settings["top_p"],
402
+ top_k=settings["top_k"],
403
+ repeat_penalty=settings["repeat_penalty"],
404
+ stream=True,
405
+ )
406
+
407
+ for chunk in stream:
408
+ if 'choices' in chunk and len(chunk['choices']) > 0:
409
+ delta = chunk['choices'][0].get('delta', {})
410
+ content = delta.get('content', '')
411
+
412
+ if content:
413
+ if first_token_time is None:
414
+ first_token_time = time.time()
415
+
416
+ token_count += 1
417
+ full_response += content
418
+
419
+ # Parse thinking blocks if reasoning enabled
420
+ if enable_reasoning and supports_reasoning:
421
+ # Simple regex extraction
422
+ thinking_match = re.search(r'<think(?:ing)?>(.*?)</think(?:ing)?>', full_response, re.DOTALL)
423
+ if thinking_match:
424
+ thinking_content = thinking_match.group(1).strip()
425
+ json_text = full_response[:thinking_match.start()] + full_response[thinking_match.end():]
426
+ else:
427
+ json_text = full_response
428
+ else:
429
+ json_text = full_response
430
+
431
+ # Try parse JSON
432
+ partial_items = _try_parse_extraction_json(json_text)
433
+ if not partial_items:
434
+ partial_items = {"action_items": [], "decisions": [], "key_points": [], "open_questions": []}
435
+
436
+ # Calculate metrics
437
+ elapsed = time.time() - start_time
438
+ tps = token_count / elapsed if elapsed > 0 else 0
439
+ eta = int((1024 - token_count) / tps) if tps > 0 else 0
440
+
441
+ # Get item counts
442
+ items_found = {k: len(v) for k, v in partial_items.items()}
443
+
444
+ # Get last item as snippet
445
+ last_item = ""
446
+ for cat in ["action_items", "decisions", "key_points", "open_questions"]:
447
+ if partial_items.get(cat):
448
+ last_item = partial_items[cat][-1]
449
+ break
450
+
451
+ # Format ticker
452
+ ticker = format_progress_ticker(
453
+ current_window=window_id,
454
+ total_windows=total_windows,
455
+ window_tokens=window.token_count,
456
+ max_tokens=4096,
457
+ items_found=items_found,
458
+ tokens_per_sec=tps,
459
+ eta_seconds=eta,
460
+ current_snippet=last_item
461
+ )
462
+
463
+ yield (ticker, thinking_content, partial_items, False)
464
+
465
+ # Final parse
466
+ if enable_reasoning and supports_reasoning:
467
+ thinking_match = re.search(r'<think(?:ing)?>(.*?)</think(?:ing)?>', full_response, re.DOTALL)
468
+ if thinking_match:
469
+ thinking_content = thinking_match.group(1).strip()
470
+ json_text = full_response[:thinking_match.start()] + full_response[thinking_match.end():]
471
+ else:
472
+ json_text = full_response
473
+ else:
474
+ json_text = full_response
475
+
476
+ final_items = _try_parse_extraction_json(json_text)
477
+
478
+ if not final_items:
479
+ error_msg = f"Failed to parse JSON from window {window_id}"
480
+ tracer.log_extraction(
481
+ window_id=window_id,
482
+ extraction=None,
483
+ llm_response=_sample_llm_response(full_response),
484
+ error=error_msg
485
+ )
486
+ raise ValueError(error_msg)
487
+
488
+ # Log success
489
+ tracer.log_extraction(
490
+ window_id=window_id,
491
+ extraction=final_items,
492
+ llm_response=_sample_llm_response(full_response),
493
+ thinking=_sample_llm_response(thinking_content) if thinking_content else None,
494
+ error=None
495
+ )
496
+
497
+ # Final ticker
498
+ elapsed = time.time() - start_time
499
+ tps = token_count / elapsed if elapsed > 0 else 0
500
+ items_found = {k: len(v) for k, v in final_items.items()}
501
+
502
+ ticker = format_progress_ticker(
503
+ current_window=window_id,
504
+ total_windows=total_windows,
505
+ window_tokens=window.token_count,
506
+ max_tokens=4096,
507
+ items_found=items_found,
508
+ tokens_per_sec=tps,
509
+ eta_seconds=0,
510
+ current_snippet="✅ Extraction complete"
511
+ )
512
+
513
+ yield (ticker, thinking_content, final_items, True)
514
+
515
+ except Exception as e:
516
+ tracer.log_extraction(
517
+ window_id=window_id,
518
+ extraction=None,
519
+ llm_response=_sample_llm_response(full_response) if full_response else "",
520
+ error=str(e)
521
+ )
522
+ raise
523
+
524
+
525
+ def deduplicate_items(
526
+ all_items: Dict[str, List[str]],
527
+ embedding_model: EmbeddingModel,
528
+ similarity_threshold: float,
529
+ tracer: Any
530
+ ) -> Dict[str, List[str]]:
531
+ """
532
+ Deduplicate items across all categories using embeddings.
533
+
534
+ Args:
535
+ all_items: Dict of {category: [items]}
536
+ embedding_model: Loaded embedding model
537
+ similarity_threshold: Cosine similarity threshold (0.0-1.0)
538
+ tracer: Tracer instance
539
+
540
+ Returns:
541
+ Deduplicated dict of {category: [items]}
542
+ """
543
+ deduplicated = {}
544
+
545
+ for category, items in all_items.items():
546
+ if not items:
547
+ deduplicated[category] = []
548
+ continue
549
+
550
+ original_count = len(items)
551
+
552
+ # Compute embeddings for all items
553
+ embeddings = []
554
+ for item in items:
555
+ emb = embedding_model.embed(item)
556
+ embeddings.append(emb)
557
+
558
+ # Mark duplicates
559
+ keep_indices = []
560
+ for i in range(len(items)):
561
+ is_duplicate = False
562
+
563
+ # Compare with all previously kept items
564
+ for j in keep_indices:
565
+ similarity = cosine_similarity(embeddings[i], embeddings[j])
566
+ if similarity >= similarity_threshold:
567
+ is_duplicate = True
568
+ break
569
+
570
+ if not is_duplicate:
571
+ keep_indices.append(i)
572
+
573
+ # Keep only unique items
574
+ unique_items = [items[i] for i in keep_indices]
575
+ deduplicated[category] = unique_items
576
+
577
+ # Log deduplication
578
+ duplicates_removed = original_count - len(unique_items)
579
+ tracer.log_deduplication(
580
+ category=category,
581
+ original_count=original_count,
582
+ deduplicated_count=len(unique_items),
583
+ duplicates_removed=duplicates_removed,
584
+ similarity_threshold=similarity_threshold,
585
+ embedding_model=embedding_model.model_key
586
+ )
587
+
588
+ logger.info(f"Dedup {category}: {original_count} → {len(unique_items)} ({duplicates_removed} removed)")
589
+
590
+ return deduplicated
591
+
592
+
593
+ def stream_synthesize_executive_summary(
594
+ synthesis_llm: Llama,
595
+ deduplicated_items: Dict[str, List[str]],
596
+ model_config: Dict[str, Any],
597
+ output_language: str,
598
+ enable_reasoning: bool,
599
+ max_tokens: int,
600
+ tracer: Any
601
+ ) -> Generator[Tuple[str, str, bool], None, None]:
602
+ """
603
+ Stream synthesis of executive summary from deduplicated items.
604
+
605
+ Yields:
606
+ (summary_text, thinking_text, is_complete)
607
+ """
608
+ # Build synthesis prompt
609
+ item_counts = {k: len(v) for k, v in deduplicated_items.items()}
610
+
611
+ # Format items for prompt
612
+ items_text = ""
613
+ for category, items in deduplicated_items.items():
614
+ if items:
615
+ category_label = {
616
+ "action_items": "Action Items" if output_language == "en" else "行動項目",
617
+ "decisions": "Decisions" if output_language == "en" else "決策",
618
+ "key_points": "Key Points" if output_language == "en" else "關鍵要點",
619
+ "open_questions": "Open Questions" if output_language == "en" else "未解決問題"
620
+ }.get(category, category)
621
+
622
+ items_text += f"\n{category_label}:\n"
623
+ for i, item in enumerate(items, 1):
624
+ items_text += f"{i}. {item}\n"
625
+
626
+ if output_language == "zh-TW":
627
+ system_prompt = "你是執行摘要專家。生成簡潔的執行摘要。"
628
+ user_prompt = f"基於以下結構化資訊生成執行摘要:\n{items_text}\n\n請提供簡明的執行摘要。"
629
+ else:
630
+ system_prompt = "You are an executive summary expert. Generate concise summaries."
631
+ user_prompt = f"Generate an executive summary based on these structured items:\n{items_text}\n\nProvide a concise executive summary."
632
+
633
+ messages = [
634
+ {"role": "system", "content": system_prompt},
635
+ {"role": "user", "content": user_prompt}
636
+ ]
637
+
638
+ # Stream synthesis
639
+ full_summary = ""
640
+ thinking_content = ""
641
+
642
+ try:
643
+ settings = model_config["inference_settings"]
644
+ stream = synthesis_llm.create_chat_completion(
645
+ messages=messages,
646
+ max_tokens=max_tokens,
647
+ temperature=settings["temperature"],
648
+ top_p=settings["top_p"],
649
+ top_k=settings["top_k"],
650
+ repeat_penalty=settings["repeat_penalty"],
651
+ stream=True,
652
+ )
653
+
654
+ for chunk in stream:
655
+ if 'choices' in chunk and len(chunk['choices']) > 0:
656
+ delta = chunk['choices'][0].get('delta', {})
657
+ content = delta.get('content', '')
658
+
659
+ if content:
660
+ full_summary += content
661
+
662
+ # Parse thinking if reasoning enabled
663
+ if enable_reasoning and model_config.get("supports_reasoning"):
664
+ thinking_match = re.search(r'<think(?:ing)?>(.*?)</think(?:ing)?>', full_summary, re.DOTALL)
665
+ if thinking_match:
666
+ thinking_content = thinking_match.group(1).strip()
667
+ summary_text = full_summary[:thinking_match.start()] + full_summary[thinking_match.end():]
668
+ else:
669
+ summary_text = full_summary
670
+ else:
671
+ summary_text = full_summary
672
+
673
+ yield (summary_text, thinking_content, False)
674
+
675
+ # Final parse
676
+ if enable_reasoning and model_config.get("supports_reasoning"):
677
+ thinking_match = re.search(r'<think(?:ing)?>(.*?)</think(?:ing)?>', full_summary, re.DOTALL)
678
+ if thinking_match:
679
+ thinking_content = thinking_match.group(1).strip()
680
+ summary_text = full_summary[:thinking_match.start()] + full_summary[thinking_match.end():]
681
+ else:
682
+ summary_text = full_summary
683
+ else:
684
+ summary_text = full_summary
685
+
686
+ # Log synthesis
687
+ tracer.log_synthesis(
688
+ synthesis_model=model_config["name"],
689
+ input_item_counts=item_counts,
690
+ output_summary=_sample_llm_response(summary_text),
691
+ thinking=_sample_llm_response(thinking_content) if thinking_content else None,
692
+ error=None
693
+ )
694
+
695
+ yield (summary_text, thinking_content, True)
696
+
697
+ except Exception as e:
698
+ tracer.log_synthesis(
699
+ synthesis_model=model_config["name"],
700
+ input_item_counts=item_counts,
701
+ output_summary="",
702
+ thinking=None,
703
+ error=str(e)
704
+ )
705
+ raise
meeting_summarizer/trace.py ADDED
@@ -0,0 +1,197 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Trace Logger for Advanced Mode Pipeline
3
+
4
+ Logs extraction, deduplication, and synthesis operations for debugging
5
+ and audit trail. Supports JSONL format for easy parsing.
6
+ """
7
+
8
+ import json
9
+ import time
10
+ from typing import Dict, List, Any, Optional
11
+ from datetime import datetime
12
+ import logging
13
+
14
+ logger = logging.getLogger(__name__)
15
+
16
+
17
+ class Tracer:
18
+ """Trace logger for Advanced Mode 3-stage pipeline."""
19
+
20
+ def __init__(self, enabled: bool = True):
21
+ """
22
+ Initialize tracer.
23
+
24
+ Args:
25
+ enabled: Whether to enable trace logging
26
+ """
27
+ self.enabled = enabled
28
+ self.trace_entries: List[Dict[str, Any]] = []
29
+ self.start_time = time.time()
30
+
31
+ def log_extraction(
32
+ self,
33
+ window_id: int,
34
+ extraction: Optional[Dict[str, List[str]]],
35
+ llm_response: str,
36
+ thinking: Optional[str] = None,
37
+ error: Optional[str] = None
38
+ ) -> None:
39
+ """
40
+ Log extraction operation for a single window.
41
+
42
+ Args:
43
+ window_id: Window identifier
44
+ extraction: Extracted items dict (action_items, decisions, key_points, open_questions)
45
+ llm_response: Sampled LLM response (first 400 chars)
46
+ thinking: Sampled thinking/reasoning content (if applicable)
47
+ error: Error message if extraction failed
48
+ """
49
+ if not self.enabled:
50
+ return
51
+
52
+ entry = {
53
+ "stage": "extraction",
54
+ "timestamp": datetime.now().isoformat(),
55
+ "elapsed_seconds": round(time.time() - self.start_time, 2),
56
+ "window_id": window_id,
57
+ "success": extraction is not None and error is None,
58
+ "error": error,
59
+ "extraction": extraction,
60
+ "llm_response_sample": llm_response[:400] if llm_response else None,
61
+ "thinking_sample": thinking[:400] if thinking else None,
62
+ }
63
+
64
+ self.trace_entries.append(entry)
65
+ logger.debug(f"[Trace] Extraction window {window_id}: {entry['success']}")
66
+
67
+ def log_deduplication(
68
+ self,
69
+ category: str,
70
+ original_count: int,
71
+ deduplicated_count: int,
72
+ duplicates_removed: int,
73
+ similarity_threshold: float,
74
+ embedding_model: str
75
+ ) -> None:
76
+ """
77
+ Log deduplication operation for a category.
78
+
79
+ Args:
80
+ category: Category name (action_items, decisions, etc.)
81
+ original_count: Number of items before deduplication
82
+ deduplicated_count: Number of items after deduplication
83
+ duplicates_removed: Number of duplicates removed
84
+ similarity_threshold: Similarity threshold used
85
+ embedding_model: Embedding model used
86
+ """
87
+ if not self.enabled:
88
+ return
89
+
90
+ entry = {
91
+ "stage": "deduplication",
92
+ "timestamp": datetime.now().isoformat(),
93
+ "elapsed_seconds": round(time.time() - self.start_time, 2),
94
+ "category": category,
95
+ "original_count": original_count,
96
+ "deduplicated_count": deduplicated_count,
97
+ "duplicates_removed": duplicates_removed,
98
+ "duplicate_rate": round(duplicates_removed / original_count * 100, 1) if original_count > 0 else 0.0,
99
+ "similarity_threshold": similarity_threshold,
100
+ "embedding_model": embedding_model,
101
+ }
102
+
103
+ self.trace_entries.append(entry)
104
+ logger.debug(f"[Trace] Deduplication {category}: {original_count} → {deduplicated_count} ({duplicates_removed} removed)")
105
+
106
+ def log_synthesis(
107
+ self,
108
+ synthesis_model: str,
109
+ input_item_counts: Dict[str, int],
110
+ output_summary: str,
111
+ thinking: Optional[str] = None,
112
+ error: Optional[str] = None
113
+ ) -> None:
114
+ """
115
+ Log synthesis operation.
116
+
117
+ Args:
118
+ synthesis_model: Model key used for synthesis
119
+ input_item_counts: Dict of category counts fed to synthesis
120
+ output_summary: Generated summary (sampled)
121
+ thinking: Thinking/reasoning content (sampled, if applicable)
122
+ error: Error message if synthesis failed
123
+ """
124
+ if not self.enabled:
125
+ return
126
+
127
+ entry = {
128
+ "stage": "synthesis",
129
+ "timestamp": datetime.now().isoformat(),
130
+ "elapsed_seconds": round(time.time() - self.start_time, 2),
131
+ "synthesis_model": synthesis_model,
132
+ "input_item_counts": input_item_counts,
133
+ "success": error is None,
134
+ "error": error,
135
+ "output_summary_sample": output_summary[:400] if output_summary else None,
136
+ "thinking_sample": thinking[:400] if thinking else None,
137
+ }
138
+
139
+ self.trace_entries.append(entry)
140
+ logger.debug(f"[Trace] Synthesis: {entry['success']}")
141
+
142
+ def get_trace_jsonl(self) -> str:
143
+ """
144
+ Get trace entries as JSONL string.
145
+
146
+ Returns:
147
+ JSONL string (one JSON object per line)
148
+ """
149
+ if not self.enabled:
150
+ return ""
151
+
152
+ return "\n".join(json.dumps(entry, ensure_ascii=False) for entry in self.trace_entries)
153
+
154
+ def get_trace_json(self) -> List[Dict[str, Any]]:
155
+ """
156
+ Get trace entries as list of dicts.
157
+
158
+ Returns:
159
+ List of trace entry dicts
160
+ """
161
+ if not self.enabled:
162
+ return []
163
+
164
+ return self.trace_entries
165
+
166
+ def get_summary_stats(self) -> Dict[str, Any]:
167
+ """
168
+ Get summary statistics from trace.
169
+
170
+ Returns:
171
+ Dict with pipeline statistics
172
+ """
173
+ if not self.enabled or not self.trace_entries:
174
+ return {}
175
+
176
+ extraction_entries = [e for e in self.trace_entries if e["stage"] == "extraction"]
177
+ dedup_entries = [e for e in self.trace_entries if e["stage"] == "deduplication"]
178
+ synthesis_entries = [e for e in self.trace_entries if e["stage"] == "synthesis"]
179
+
180
+ total_extracted = sum(
181
+ sum(e["extraction"].values()) if e.get("extraction") else 0
182
+ for e in extraction_entries
183
+ )
184
+
185
+ total_deduplicated = sum(e["deduplicated_count"] for e in dedup_entries)
186
+ total_duplicates = sum(e["duplicates_removed"] for e in dedup_entries)
187
+
188
+ return {
189
+ "total_windows": len(extraction_entries),
190
+ "successful_extractions": sum(1 for e in extraction_entries if e["success"]),
191
+ "total_items_extracted": total_extracted,
192
+ "total_items_after_dedup": total_deduplicated,
193
+ "total_duplicates_removed": total_duplicates,
194
+ "duplicate_rate": round(total_duplicates / total_extracted * 100, 1) if total_extracted > 0 else 0.0,
195
+ "synthesis_success": synthesis_entries[0]["success"] if synthesis_entries else False,
196
+ "total_elapsed_seconds": round(time.time() - self.start_time, 2),
197
+ }