Spaces:
Running
Running
| # Implementation Plan: Debug System Prompt & Custom GGUF Loader | |
| ## Feature 1: Debug System Prompt Display | |
| ### Purpose | |
| Show users the exact system prompt that will be sent to the LLM for transparency and debugging. | |
| ### Current State | |
| The system prompt is built inline in `summarize_streaming()` (lines ~903-916) but never exposed to the UI. | |
| ### Implementation Plan | |
| #### Step 1: Extract Prompt Builder Function | |
| **Location**: Add new function in `app.py` around line 880 | |
| ```python | |
| def build_system_prompt(length: str, format_type: str, language: str, enable_reasoning: bool, supports_think_tags: bool) -> str: | |
| """Build the system prompt that will be sent to the LLM. | |
| Args: | |
| length: "tiny", "short", "medium", "long" | |
| format_type: "bullets", "paragraph", "structured" | |
| language: "en", "zh-TW" | |
| enable_reasoning: Whether reasoning mode is enabled | |
| supports_think_tags: Whether the model supports <think> tags | |
| Returns: | |
| The complete system prompt string | |
| """ | |
| # Length configurations (existing) | |
| length_prompts = { | |
| "tiny": f"""Provide a {format_type} summary in 2-3 sentences covering: | |
| - Main topic and key points | |
| - Most important finding or conclusion | |
| - Practical takeaway""", | |
| "short": f"""Provide a {format_type} summary in 3-5 sentences covering: | |
| - Main topic and purpose | |
| - 2-3 key points or findings | |
| - Conclusion or recommendation""", | |
| "medium": f"""Provide a {format_type} summary in 1-2 paragraphs covering: | |
| - Main topic and context | |
| - Key points with brief explanations | |
| - Supporting details | |
| - Conclusions and recommendations""", | |
| "long": f"""Provide a comprehensive {format_type} summary in 3-4 paragraphs covering: | |
| - Background and context | |
| - All major points with detailed explanations | |
| - Supporting evidence and examples | |
| - Different perspectives if present | |
| - Conclusions, implications, and actionable recommendations""", | |
| } | |
| base_prompt = length_prompts.get(length, length_prompts["medium"]) | |
| if language == "zh-TW": | |
| if enable_reasoning and supports_think_tags: | |
| system_content = f"You are a helpful assistant that summarizes transcripts. First think through the content in <thinking> tags, then provide the summary.\n\n{base_prompt}\n\nPlease respond in Traditional Chinese (Taiwan)." | |
| else: | |
| system_content = f"You are a helpful assistant that summarizes transcripts.\n\n{base_prompt}\n\nPlease respond in Traditional Chinese (Taiwan)." | |
| else: | |
| if enable_reasoning and supports_think_tags: | |
| system_content = f"You are a helpful assistant that summarizes transcripts. First think through the content in <thinking> tags, then provide the summary.\n\n{base_prompt}" | |
| else: | |
| system_content = f"You are a helpful assistant that summarizes transcripts.\n\n{base_prompt}" | |
| return system_content | |
| ``` | |
| #### Step 2: Refactor summarize_streaming() | |
| **Location**: Lines ~903-916 in `app.py` | |
| Replace inline prompt building with call to `build_system_prompt()`: | |
| ```python | |
| # OLD CODE (to replace): | |
| length_prompts = {...} # Remove this dict | |
| # ... if language == "zh-TW": logic ... | |
| # NEW CODE: | |
| system_content = build_system_prompt( | |
| length=length, | |
| format_type=format_type, | |
| language=language, | |
| enable_reasoning=enable_reasoning, | |
| supports_think_tags=supports_think_tags | |
| ) | |
| ``` | |
| #### Step 3: Add UI Component | |
| **Location**: In the right column interface, after the summary output (around line 1370) | |
| Add a collapsible accordion: | |
| ```python | |
| with gr.Accordion("Debug: System Prompt", open=False): | |
| system_prompt_debug = gr.Textbox( | |
| label="System Prompt (Read-Only)", | |
| lines=10, | |
| max_lines=20, | |
| interactive=False, | |
| show_copy_button=True, | |
| value="Click 'Generate Summary' to see the system prompt that will be used." | |
| ) | |
| ``` | |
| #### Step 4: Update Event Handlers | |
| **Location**: In `generate_summary()` function | |
| Pass the built system prompt to the output: | |
| ```python | |
| def generate_summary(model_key, thread_config, custom_threads, transcript_text, | |
| summary_length, output_format, language, enable_reasoning, | |
| enable_streaming, progress=gr.Progress()): | |
| # ... existing code ... | |
| # Build system prompt for display | |
| selected_model = AVAILABLE_MODELS[model_key] | |
| supports_think_tags = selected_model.get("supports_toggle", False) or selected_model.get("supports_reasoning", False) | |
| system_prompt_preview = build_system_prompt( | |
| length=summary_length, | |
| format_type=output_format, | |
| language=language, | |
| enable_reasoning=enable_reasoning, | |
| supports_think_tags=supports_think_tags | |
| ) | |
| # ... rest of summarization logic ... | |
| # Return the system prompt along with other outputs | |
| yield final_summary, thinking_text, json_output, system_prompt_preview, status_msg | |
| ``` | |
| #### Step 5: Update Gradio Outputs | |
| **Location**: Line ~1435 | |
| Add `system_prompt_debug` to outputs list: | |
| ```python | |
| outputs=[summary_output, thinking_output, json_output, system_prompt_debug, status_message] | |
| ``` | |
| --- | |
| ## Feature 2: Custom GGUF Loader from HuggingFace | |
| ### Purpose | |
| Allow users to load any GGUF model from HuggingFace, not just the predefined list. | |
| ### Implementation Plan | |
| #### Step 1: Add Custom Model Option | |
| **Location**: In AVAILABLE_MODELS dict (around line 120) | |
| Add as the last entry: | |
| ```python | |
| AVAILABLE_MODELS = { | |
| # ... existing models ... | |
| "custom_hf": { | |
| "display": "Custom HF GGUF...", | |
| "repo_id": None, # Will be provided by user | |
| "filename": None, # Will be provided by user | |
| "quantization": None, | |
| "description": "Load any GGUF model from HuggingFace", | |
| "size_mb": 0, # Unknown | |
| "n_gpu_layers": 0, | |
| "n_ctx": 8192, | |
| "max_tokens": 4096, | |
| "supports_reasoning": False, | |
| "supports_toggle": False, | |
| }, | |
| } | |
| ``` | |
| #### Step 2: Add Custom Model UI Components | |
| **Location**: In the left column, after model dropdown (around line 1270) | |
| ```python | |
| # Custom model inputs (hidden by default) | |
| with gr.Group(visible=False) as custom_model_group: | |
| gr.Markdown("### Custom HuggingFace Model") | |
| custom_repo_id = gr.Textbox( | |
| label="HuggingFace Repo ID", | |
| placeholder="e.g., unsloth/DeepSeek-R1-Distill-Qwen-7B-GGUF", | |
| info="The HuggingFace repository containing the GGUF file", | |
| ) | |
| custom_filename = gr.Textbox( | |
| label="GGUF Filename Pattern", | |
| placeholder="e.g., *Q4_K_M.gguf or exact filename", | |
| info="Use * as wildcard or provide exact filename", | |
| ) | |
| custom_load_btn = gr.Button("Load Custom Model", variant="primary") | |
| custom_error_message = gr.Textbox( | |
| label="Status", | |
| interactive=False, | |
| visible=False, | |
| ) | |
| custom_retry_btn = gr.Button("Retry", variant="secondary", visible=False) | |
| ``` | |
| #### Step 3: Add Visibility Toggle Handler | |
| **Location**: Add new event handler around line 1490 | |
| ```python | |
| def update_custom_model_visibility(model_key): | |
| """Show/hide custom model inputs based on selection.""" | |
| is_custom = model_key == "custom_hf" | |
| return gr.update(visible=is_custom) | |
| # Add event handler | |
| model_dropdown.change( | |
| update_custom_model_visibility, | |
| inputs=[model_dropdown], | |
| outputs=[custom_model_group], | |
| ) | |
| ``` | |
| #### Step 4: Create Custom Model Loader Function | |
| **Location**: Add new function around line 710 | |
| ```python | |
| def load_custom_model(repo_id: str, filename: str, cpu_only: bool = False) -> Tuple[Optional[Llama], str]: | |
| """Load a custom GGUF model from HuggingFace. | |
| Args: | |
| repo_id: HuggingFace repository ID | |
| filename: Filename pattern or exact name | |
| cpu_only: Whether to use CPU only | |
| Returns: | |
| Tuple of (model_instance, error_message) | |
| If successful, error_message is empty string | |
| If failed, model_instance is None | |
| """ | |
| if not repo_id or not filename: | |
| return None, "❌ Error: Please provide both Repo ID and Filename" | |
| # Validate repo_id format | |
| if "/" not in repo_id: | |
| return None, "❌ Error: Repo ID must be in format 'username/repo-name'" | |
| try: | |
| n_gpu_layers = 0 if cpu_only else -1 | |
| n_ctx = 8192 # Conservative default for custom models | |
| n_batch = 512 | |
| llm = Llama.from_pretrained( | |
| repo_id=repo_id, | |
| filename=filename, | |
| n_gpu_layers=n_gpu_layers, | |
| n_ctx=n_ctx, | |
| n_batch=n_batch, | |
| verbose=False, | |
| ) | |
| return llm, "" | |
| except Exception as e: | |
| error_msg = str(e) | |
| if "not found" in error_msg.lower(): | |
| return None, f"❌ Error: Model or file not found. Check repo_id and filename.\nDetails: {error_msg}" | |
| elif "permission" in error_msg.lower() or "access" in error_msg.lower(): | |
| return None, f"❌ Error: Cannot access model. It may be private or gated.\nDetails: {error_msg}" | |
| else: | |
| return None, f"❌ Error loading model: {error_msg}" | |
| ``` | |
| #### Step 5: Add Custom Model Loading Handler | |
| **Location**: Add around line 1510 | |
| ```python | |
| def handle_custom_model_load(repo_id, filename, cpu_only): | |
| """Handle custom model loading with error display and retry option.""" | |
| llm, error = load_custom_model(repo_id, filename, cpu_only) | |
| if llm is None: | |
| # Show error and retry button | |
| return ( | |
| gr.update(visible=True, value=error), # error_message | |
| gr.update(visible=True), # retry_btn | |
| None, # model_instance (store somewhere accessible) | |
| ) | |
| else: | |
| # Success - hide error, show success message | |
| return ( | |
| gr.update(visible=True, value="✅ Model loaded successfully!"), | |
| gr.update(visible=False), # retry_btn | |
| llm, # Store model instance | |
| ) | |
| custom_load_btn.click( | |
| handle_custom_model_load, | |
| inputs=[custom_repo_id, custom_filename, cpu_only_checkbox], | |
| outputs=[custom_error_message, custom_retry_btn, model_state], # model_state is gr.State() | |
| ) | |
| custom_retry_btn.click( | |
| handle_custom_model_load, | |
| inputs=[custom_repo_id, custom_filename, cpu_only_checkbox], | |
| outputs=[custom_error_message, custom_retry_btn, model_state], | |
| ) | |
| ``` | |
| #### Step 6: Update Generate Summary for Custom Models | |
| **Location**: In `generate_summary()` function | |
| Modify to handle custom models: | |
| ```python | |
| def generate_summary(model_key, thread_config, custom_threads, transcript_text, | |
| summary_length, output_format, language, enable_reasoning, | |
| enable_streaming, custom_repo_id=None, custom_filename=None, | |
| progress=gr.Progress()): | |
| if model_key == "custom_hf": | |
| # Load custom model | |
| llm, error = load_custom_model(custom_repo_id, custom_filename, cpu_only) | |
| if llm is None: | |
| yield "", "", "", "", error | |
| return | |
| else: | |
| # Use predefined model | |
| model_info = AVAILABLE_MODELS[model_key] | |
| llm = load_model_from_config(model_info) | |
| # ... rest of the function ... | |
| ``` | |
| #### Step 7: Update UI to Pass Custom Model Values | |
| **Location**: Line ~1429 | |
| Add custom inputs to the generate summary call: | |
| ```python | |
| generate_btn.click( | |
| fn=generate_summary, | |
| inputs=[ | |
| model_dropdown, | |
| thread_config, | |
| custom_n_threads, | |
| transcript_input, | |
| summary_length, | |
| output_format, | |
| language, | |
| reasoning_checkbox, | |
| streaming_toggle, | |
| custom_repo_id, # NEW | |
| custom_filename, # NEW | |
| ], | |
| outputs=[...] | |
| ) | |
| ``` | |
| #### Step 8: Update generate_summary signature | |
| **Location**: Function definition around line 870 | |
| Update function signature to accept custom model parameters: | |
| ```python | |
| def generate_summary( | |
| model_key: str, | |
| thread_config: str, | |
| custom_threads: int, | |
| transcript_text: str, | |
| summary_length: str, | |
| output_format: str, | |
| language: str, | |
| enable_reasoning: bool, | |
| enable_streaming: bool, | |
| custom_repo_id: Optional[str] = None, # NEW | |
| custom_filename: Optional[str] = None, # NEW | |
| progress: gr.Progress = gr.Progress(), | |
| ) -> Generator: | |
| ``` | |
| #### Step 9: Update Model State Management | |
| **Location**: Add near other state declarations (around line 1250) | |
| ```python | |
| # Store loaded model to avoid reloading on each generation | |
| model_state = gr.State(None) | |
| ``` | |
| --- | |
| ## Implementation Order | |
| 1. **Feature 1 First** - Debug System Prompt (simpler, self-contained) | |
| - Step 1: Create `build_system_prompt()` function | |
| - Step 2: Refactor `summarize_streaming()` to use it | |
| - Step 3: Add UI accordion component | |
| - Step 4: Update event handlers and outputs | |
| 2. **Feature 2 Second** - Custom GGUF Loader (more complex) | |
| - Step 1: Add "custom_hf" to AVAILABLE_MODELS | |
| - Step 2: Add UI components for custom model inputs | |
| - Step 3: Add visibility toggle handler | |
| - Step 4: Create `load_custom_model()` function | |
| - Step 5: Add load/retry handlers | |
| - Step 6: Update generate_summary for custom models | |
| - Step 7: Update UI inputs | |
| - Step 8: Update function signature | |
| - Step 9: Add model state management | |
| --- | |
| ## Testing Plan | |
| ### Feature 1 Tests | |
| 1. Select different models, verify system prompt updates correctly | |
| 2. Toggle reasoning mode, verify /think or /no_think appears | |
| 3. Change language, verify Traditional Chinese prompt appears | |
| 4. Change length/format, verify prompt content changes | |
| 5. Verify prompt is read-only and copyable | |
| ### Feature 2 Tests | |
| 1. Select "Custom HF GGUF...", verify inputs appear | |
| 2. Enter invalid repo_id, verify error message with retry button | |
| 3. Enter valid but non-existent model, verify error | |
| 4. Enter valid model with wrong filename, verify error | |
| 5. Enter valid model with correct filename, verify success | |
| 6. Click retry after error, verify it retries | |
| 7. Test fallback to predefined models still works | |
| --- | |
| ## Risk Mitigation | |
| 1. **Custom model loading failures**: Already handled with try/except and user-friendly error messages | |
| 2. **Memory issues with large custom models**: Use conservative defaults (n_ctx=8192, CPU-only for HF Spaces) | |
| 3. **UI clutter**: Custom model inputs hidden by default, only show when selected | |
| 4. **Breaking existing functionality**: Feature 1 is additive only, Feature 2 extends existing paths without changing them | |
| --- | |
| ## Files to Modify | |
| - `/home/luigi/tiny-scribe/app.py` - Main implementation file | |
| ## Estimated Lines Changed | |
| - Feature 1: ~50 lines added, ~20 lines modified | |
| - Feature 2: ~150 lines added, ~30 lines modified | |
| Total: ~250 lines of code changes | |