Spaces:

Hugobartwisch
/

mangatranslator

Running on Zero

App Files Files Community

bartwisch commited on Nov 26, 2025

Commit

376598e

0 Parent(s):

Initial release v1.0.0 – Manga Translator with MIT License

Browse files

Files changed (18) hide show

.gitignore +34 -0
LICENSE +21 -0
README.md +139 -0
app.py +467 -0
packages.txt +3 -0
pages/config.py +327 -0
requirements-optional.txt +12 -0
requirements.txt +22 -0
run.sh +10 -0
setup.bat +68 -0
setup.sh +40 -0
src/__init__.py +0 -0
src/image_processor.py +171 -0
src/ocr_handler.py +434 -0
src/pdf_handler.py +80 -0
src/translator.py +323 -0
src/ui_state.py +21 -0
tests/test_ui_state.py +19 -0

.gitignore ADDED Viewed

	@@ -0,0 +1,34 @@

+# Python
+__pycache__/
+*.py[cod]
+*$py.class
+*.pyc
+# Virtual Environment
+venv/
+env/
+.env
+# Streamlit
+.streamlit/
+# IDE / Editor
+.gemini/
+.vscode/
+.idea/
+*.swp
+*.swo
+# Internal notes
+TASK.md
+# Output Files
+translated_manga.pdf
+*.pdf
+*.png
+*.jpg
+*.jpeg
+# OS Files
+.DS_Store
+Thumbs.db

LICENSE ADDED Viewed

	@@ -0,0 +1,21 @@

+MIT License
+Copyright (c) 2025 Christoph Bartwisch
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

README.md ADDED Viewed

	@@ -0,0 +1,139 @@

+# 📚 Manga Translator
+[![Streamlit App](https://static.streamlit.io/badges/streamlit_badge_black_white.svg)](https://share.streamlit.io/)
+An AI-powered application to translate Manga/Comic PDFs from English to German. It preserves the original layout by detecting speech bubbles, removing the original text, and overlaying the translated text.
+**Repository:** [github.com/bartwisch/mangatranslator](https://github.com/bartwisch/mangatranslator/)
+## ✨ Features
+*   **Multiple OCR Engines** (Lazy Loading):
+    *   **Magi** ⭐ (The Manga Whisperer) - Default, best for manga, detects speech bubbles automatically
+    *   **manga-ocr** - Specialized for manga fonts (optional)
+    *   **PaddleOCR** - Good general purpose (optional)
+    *   **EasyOCR** - Multi-language support (optional)
+*   **Speech Bubble Grouping**: Automatically groups text lines within speech bubbles for context-aware translation
+*   **Multiple Translation Engines**:
+    *   **Google Translate** (Free)
+    *   **DeepL** (High Quality, requires API Key)
+    *   **OpenAI GPT-4o-mini** (Context-aware, requires API Key)
+    *   **xAI Grok** (Context-aware, requires API Key)
+    *   **xAI Grok Vision** (No OCR needed, uses vision model)
+*   **Smart Layout**: Automatically cleans speech bubbles and fits translated text (dynamically resizing fonts).
+*   **Interactive Preview**: Select specific pages to translate visually.
+*   **OCR Config Page**: Live preview to tune OCR parameters and bubble grouping.
+*   **Cost Estimation**: Shows token usage and estimated costs for AI models.
+## 🚀 How to Deploy on Streamlit Cloud
+This is the easiest way to run the app for free.
+1.  Go to **[share.streamlit.io](https://share.streamlit.io/)**.
+2.  Click **"New App"**.
+3.  Select "Use existing repo" and enter: `bartwisch/mangatranslator`.
+4.  Set **Main file path** to `app.py`.
+5.  Click **Deploy!** 🎈
+*Note: The first deployment might take 3-5 minutes because it needs to install PyTorch and OCR models.*
+## 🛠️ Local Installation
+### Quick Setup (Recommended)
+**macOS / Linux:**
+```bash
+git clone https://github.com/bartwisch/mangatranslator.git
+cd mangatranslator
+./setup.sh
+```
+**Windows:**
+```cmd
+git clone https://github.com/bartwisch/mangatranslator.git
+cd mangatranslator
+setup.bat
+```
+### Manual Installation
+1.  **Clone the repository**:
+    ```bash
+    git clone https://github.com/bartwisch/mangatranslator.git
+    cd mangatranslator
+    ```
+2.  **Set up Python environment**:
+    ```bash
+    python3 -m venv venv
+    source venv/bin/activate  # Windows: venv\Scripts\activate.bat
+    # Install base requirements (includes Magi OCR)
+    pip install -r requirements.txt
+    ```
+3.  **Optional: Install additional OCR engines**:
+    ```bash
+    # Install all optional engines
+    pip install -r requirements-optional.txt
+    # Or install individually:
+    pip install manga-ocr paddlepaddle paddleocr  # Manga-OCR
+    pip install paddlepaddle paddleocr            # PaddleOCR only
+    pip install easyocr                           # EasyOCR only
+    ```
+4.  **Run the app**:
+    ```bash
+    streamlit run app.py
+    # Or use: ./run.sh
+    ```
+5.  **Open in browser**: http://localhost:8501
+### OCR Configuration Page
+Navigate to the **Configuration** page in the app to:
+- Select your preferred OCR engine (Magi is default)
+- Choose OCR preprocessing mode
+- Upload a PDF and preview OCR detection
+- Adjust bubble grouping threshold
+- Compare different OCR engines
+## 🔑 API Keys
+The app requires API Keys for **DeepL**, **OpenAI**, or **xAI** if you choose to use those services.
+*   Keys are entered securely in the Configuration page.
+*   Keys are **NOT** stored in the repository.
+*   Google Translate is available as a free fallback.
+## 📋 Requirements
+*   Python 3.10+
+*   See `requirements.txt` for base Python packages (includes Magi OCR).
+*   See `requirements-optional.txt` for optional OCR engines.
+*   See `packages.txt` for system dependencies (required for Linux/Cloud deployment).
+## 🎯 OCR Engine Comparison
+| Engine | Best For | Speed | Quality | Installation |
+|--------|----------|-------|---------|--------------|
+| **Magi** ⭐ | Manga (auto bubble detection) | Medium | Excellent | Default ✅ |
+| Manga-OCR | Manga/Comic fonts | Fast | Very Good | Optional |
+| PaddleOCR | General purpose | Fast | Good | Optional |
+| EasyOCR | Multi-language | Slow | Good | Optional |
+## 📜 License
+This project is licensed under the **MIT License**. See the `LICENSE` file for details.
+### Third-Party OCR Libraries
+This application uses or optionally supports several third-party OCR engines and libraries, including but not limited to:
+- `magi-ocr` (custom model stack based on PyTorch and Transformers)
+- `manga-ocr` (MIT License)
+- `PaddleOCR` (Apache-2.0 License)
+- `EasyOCR` (Apache-2.0 License)
+These components are subject to their respective licenses as provided by their authors.

app.py ADDED Viewed

	@@ -0,0 +1,467 @@

+import streamlit as st
+import os
+import tempfile
+import certifi
+from src.pdf_handler import PDFHandler
+from src.ocr_handler import OCRHandler
+from src.translator import TranslatorService
+from src.image_processor import ImageProcessor
+from src.ui_state import should_display_thumbnails
+# Fix SSL issue permanently for this session
+os.environ['SSL_CERT_FILE'] = certifi.where()
+st.set_page_config(page_title="Manga Translator", page_icon="logo.png")
+st.title("📚 Manga Translator (English -> German)")
+@st.cache_resource
+def load_ocr(ocr_engine: str = 'magi'):
+    # Magi is best for manga (detects speech bubbles + OCR)
+    # manga-ocr is specialized for manga/comic fonts
+    # PaddleOCR is good general purpose
+    return OCRHandler(lang_list=['en'], gpu=False, ocr_engine=ocr_engine)
+def load_translator(service_type: str, api_key: str = None):
+    # We don't cache this resource anymore if the key can change dynamically
+    return TranslatorService(source='en', target='de', service_type=service_type, api_key=api_key)
+def parse_page_range(range_str: str) -> list[int]:
+    """Parse a page range string (e.g., "1-3, 5, 7-9") into a list of 0-indexed integers."""
+    if not range_str.strip():
+        return None
+    pages = set()
+    parts = range_str.split(',')
+    for part in parts:
+        part = part.strip()
+        if '-' in part:
+            try:
+                start, end = map(int, part.split('-'))
+                # Convert to 0-indexed, inclusive
+                pages.update(range(start - 1, end))
+            except ValueError:
+                continue
+        else:
+            try:
+                # Convert to 0-indexed
+                pages.add(int(part) - 1)
+            except ValueError:
+                continue
+    return sorted(list(pages))
+def main():
+    # Session State Initialization
+    if 'preview_images' not in st.session_state:
+        st.session_state.preview_images = []
+    if 'temp_pdf_path' not in st.session_state:
+        st.session_state.temp_pdf_path = None
+    if 'last_uploaded_file_id' not in st.session_state:
+        st.session_state.last_uploaded_file_id = None
+    # Persistent API Keys & Settings (Defaults if not set in Config)
+    if 'stored_deepl_key' not in st.session_state:
+        st.session_state.stored_deepl_key = ""
+    if 'stored_openai_key' not in st.session_state:
+        st.session_state.stored_openai_key = ""
+    if 'stored_xai_key' not in st.session_state:
+        st.session_state.stored_xai_key = ""
+    if 'translation_service_selection' not in st.session_state:
+        st.session_state.translation_service_selection = "OpenAI GPT-4o-mini (API Key - Recommended)"
+    if 'debug_mode_checkbox' not in st.session_state:
+        st.session_state.debug_mode_checkbox = True
+    if 'show_boxes_checkbox' not in st.session_state:
+        st.session_state.show_boxes_checkbox = False
+    if 'bubble_threshold_setting' not in st.session_state:
+        st.session_state.bubble_threshold_setting = 160
+    if 'ocr_engine_selection' not in st.session_state:
+        st.session_state.ocr_engine_selection = "magi"
+    if 'ocr_preprocess_mode' not in st.session_state:
+        st.session_state.ocr_preprocess_mode = "gentle"
+    if 'stop_translation' not in st.session_state:
+        st.session_state.stop_translation = False
+    # Read settings from Session State
+    service_choice = st.session_state.translation_service_selection
+    debug_mode = st.session_state.debug_mode_checkbox
+    show_boxes = st.session_state.show_boxes_checkbox
+    bubble_threshold = st.session_state.bubble_threshold_setting
+    ocr_engine = st.session_state.ocr_engine_selection
+    ocr_preprocess = st.session_state.ocr_preprocess_mode
+    pdf_handler = PDFHandler()
+    image_processor = ImageProcessor()
+    # --- Quick Settings Panel ---
+    with st.expander("⚙️ Quick Settings", expanded=not st.session_state.stored_openai_key):
+        st.markdown("**OpenAI API Key** (Recommended for best translation quality)")
+        col_key_input, col_key_test = st.columns([3, 1])
+        with col_key_input:
+            st.text_input(
+                "Enter your OpenAI API Key",
+                type="password",
+                key="stored_openai_key",
+                help="Your API key is stored securely in this session only.",
+                label_visibility="collapsed"
+            )
+        with col_key_test:
+            if st.button("Test Key", key="test_openai_key"):
+                if not st.session_state.stored_openai_key:
+                    st.error("Please enter an OpenAI API Key before testing.")
+                else:
+                    try:
+                        tester = TranslatorService(
+                            source='en',
+                            target='de',
+                            service_type='openai',
+                            api_key=st.session_state.stored_openai_key,
+                        )
+                        if hasattr(tester, "validate_api_key"):
+                            tester.validate_api_key()
+                    except Exception as e:
+                        st.error(f"❌ API Key Error: {e}")
+                    else:
+                        st.success("✓ OpenAI API Key is valid.")
+        if st.session_state.stored_openai_key:
+            st.success("✓ API Key configured")
+        else:
+            st.info("💡 Get your API key from [platform.openai.com/api-keys](https://platform.openai.com/api-keys)")
+        st.divider()
+        st.markdown("🔧 **[Visit Configuration Page](/config)** for advanced settings (OCR engine, translation service, debug options, etc.)")
+    # --- Service Configuration Logic ---
+    api_key = None
+    service_type = 'google'
+    use_vision = False
+    if "DeepL" in service_choice:
+        service_type = 'deepl'
+        api_key = st.session_state.stored_deepl_key
+    elif "OpenAI" in service_choice:
+        service_type = 'openai'
+        api_key = st.session_state.stored_openai_key
+    elif "xAI" in service_choice: # Covers both Grok and Vision
+        service_type = 'xai'
+        if "Vision" in service_choice:
+            use_vision = True
+        api_key = st.session_state.stored_xai_key
+    # Show API key warning at the top if needed
+    if (service_type in ['deepl', 'openai', 'xai']) and not api_key:
+        st.error(f"⚠️ **Missing API Key:** Please enter your {service_type.capitalize()} API Key in the Quick Settings above or visit the **[Configuration](/config)** page.")
+    uploaded_file = st.file_uploader("Upload a Manga PDF (English)", type=["pdf"])
+    # Only load OCR if NOT using vision mode
+    if not use_vision:
+        ocr_handler = load_ocr(ocr_engine=ocr_engine)
+    else:
+        ocr_handler = None
+    # Initialize translator
+    if (service_type in ['deepl', 'openai', 'xai']) and not api_key:
+        translator = None
+    else:
+        try:
+            translator = load_translator(service_type=service_type, api_key=api_key)
+        except Exception as e:
+            st.error(f"❌ **Translation Error:** Failed to initialize translator: {e}")
+            translator = None
+    if uploaded_file is not None:
+        # Check for new file upload
+        current_file_id = f"{uploaded_file.name}_{uploaded_file.size}"
+        if st.session_state.last_uploaded_file_id != current_file_id:
+            # New file detected! Reset state.
+            st.session_state.last_uploaded_file_id = current_file_id
+            # Cleanup old temp file
+            if st.session_state.temp_pdf_path and os.path.exists(st.session_state.temp_pdf_path):
+                try:
+                    os.remove(st.session_state.temp_pdf_path)
+                except:
+                    pass
+            # Save new temp file
+            with tempfile.NamedTemporaryFile(delete=False, suffix=".pdf") as tmp_file:
+                tmp_file.write(uploaded_file.read())
+                st.session_state.temp_pdf_path = tmp_file.name
+            # Generate Previews (Low Res)
+            with st.spinner("Generating page previews..."):
+                st.session_state.preview_images = pdf_handler.extract_images_from_pdf(st.session_state.temp_pdf_path, zoom=1)
+            # Initialize all pages as selected
+            for i in range(len(st.session_state.preview_images)):
+                st.session_state[f"page_select_{i}"] = True
+        st.success("File uploaded successfully!")
+        # --- Page Selection UI ---
+        st.subheader("Select Pages to Translate")
+        should_show_thumbnails = should_display_thumbnails(
+            st.session_state.get('translation_in_progress', False)
+        )
+        start_translation = False
+        if should_show_thumbnails:
+            # Selection Buttons
+            col_sel1, col_sel2, col_sel3, col_sel4 = st.columns([1, 1, 2, 2])
+            if col_sel1.button("Select All"):
+                for i in range(len(st.session_state.preview_images)):
+                    st.session_state[f"page_select_{i}"] = True
+                st.rerun()
+            if col_sel2.button("Deselect All"):
+                for i in range(len(st.session_state.preview_images)):
+                    st.session_state[f"page_select_{i}"] = False
+                st.rerun()
+            # Top Start Translation Button
+            if col_sel4.button("🚀 Start Translation", type="primary", key="start_translate_top"):
+                # We'll use session state to trigger translation
+                st.session_state.trigger_translation = True
+            # Grid Display
+            num_cols = 4
+            cols = st.columns(num_cols)
+            # CSS for compact checkboxes
+            st.markdown("""
+                <style>
+                /* Prevent checkbox label wrapping */
+                div[data-testid="stCheckbox"] label span {
+                    white-space: nowrap;
+                }
+                </style>
+            """, unsafe_allow_html=True)
+            for i, img in enumerate(st.session_state.preview_images):
+                with cols[i % num_cols]:
+                    # Create a bordered container for the "card" look
+                    with st.container(border=True):
+                        # Check selection state
+                        key = f"page_select_{i}"
+                        # Use False as default to prevent accidental selection
+                        # The key should always exist after upload (initialized in line 176-177)
+                        is_selected = st.session_state.get(key, False)
+                        # Toggle Button (acts as header)
+                        btn_label = f"✅ Page {i+1}" if is_selected else f"⬜ Page {i+1}"
+                        btn_type = "primary" if is_selected else "secondary"
+                        if st.button(btn_label, key=f"btn_{i}", type=btn_type, width="stretch"):
+                            # Toggle state
+                            st.session_state[key] = not is_selected
+                            st.rerun()
+                        # Image with negative margin to pull it up closer
+                        st.markdown('<div style="margin-top: -10px;"></div>', unsafe_allow_html=True)
+                        st.image(img, width="stretch")
+                        # Second toggle button below image (for clicking on image area)
+                        # Using a minimal icon-only button
+                        toggle_icon = "✓" if is_selected else "○"
+                        if st.button(toggle_icon, key=f"img_btn_{i}", width="stretch", help="Click to toggle selection"):
+                            # Toggle state
+                            st.session_state[key] = not is_selected
+                            st.rerun()
+        else:
+            st.info("Translation has started. Thumbnails are hidden while processing is underway.")
+        selected_indices = [
+            i for i in range(len(st.session_state.preview_images))
+            if st.session_state.get(f"page_select_{i}", True)
+        ]
+        st.write(f"Selected {len(selected_indices)} pages.")
+        # --- Translation Trigger (Bottom Button or Top Button) ---
+        if should_show_thumbnails:
+            # Disable start button if translation is already in progress
+            start_translation = st.button("Start Translation", type="primary", key="start_translate_bottom", disabled=st.session_state.get('translation_in_progress', False)) or st.session_state.get('trigger_translation', False)
+        else:
+            start_translation = st.session_state.get('trigger_translation', False)
+        # Reset trigger
+        if st.session_state.get('trigger_translation', False):
+            st.session_state.trigger_translation = False
+        if start_translation:
+            if not selected_indices:
+                st.error("Please select at least one page.")
+            elif translator is None:
+                st.error("Translator not initialized. Please check your API Key in Configuration.")
+            else:
+                # Validate API key before starting translation to surface clear errors early
+                try:
+                    if hasattr(translator, "validate_api_key"):
+                        translator.validate_api_key()
+                except Exception as e:
+                    st.error(f"❌ API Key Error: {e}")
+                else:
+                    # Set translation in progress flag
+                    st.session_state.translation_in_progress = True
+                    st.rerun()
+        # Check if translation is in progress
+        if st.session_state.get('translation_in_progress', False):
+            tmp_path = st.session_state.temp_pdf_path
+            # Hide thumbnails and show progress
+            st.markdown("---")
+            col1, col2 = st.columns([3, 1])
+            with col1:
+                st.subheader("🔄 Translation in Progress")
+            with col2:
+                if st.button("⏹️ Stop", type="secondary", help="Stop translation and save completed pages"):
+                    st.session_state.stop_translation = True
+                    st.rerun()
+            st.info(f"Translating {len(selected_indices)} selected pages...")
+            try:
+                # 1. Extract Images (High Res for processing)
+                # Only extract the pages we actually need
+                status_placeholder = st.empty()
+                status_placeholder.info(f"📄 Extracting {len(selected_indices)} pages in high resolution...")
+                images = pdf_handler.extract_images_from_pdf(tmp_path, pages=selected_indices, zoom=2)
+                processed_images = []
+                all_text_data = [] # For debug mode
+                # Progress tracking
+                progress_bar = st.progress(0)
+                progress_text = st.empty()
+                for i, img in enumerate(images):
+                    # Check if user requested to stop
+                    if st.session_state.get('stop_translation', False):
+                        st.warning("⏹️ Translation stopped by user. Saving completed pages...")
+                        break
+                    original_page_num = selected_indices[i] + 1
+                    progress_text.text(f"🔍 Processing page {original_page_num} ({i+1}/{len(images)})...")
+                    text_regions = []
+                    if use_vision:
+                        # VISION MODE
+                        st.info(f"Analyzing page {original_page_num} with Grok Vision...")
+                        vision_results = translator.translate_image_with_vision(img)
+                        for item in vision_results:
+                            bbox = item['bbox']
+                            original = item['original']
+                            translated = item['translated']
+                            text_regions.append((bbox, original, translated))
+                            if debug_mode:
+                                all_text_data.append({"Page": original_page_num, "Original": original, "Translated": translated, "Type": "Vision"})
+                    else:
+                        # CLASSIC OCR MODE
+                        # 2. OCR mit Sprechblasen-Gruppierung
+                        # Verwendet detect_and_group_text um nahe Textblöcke zusammenzufassen
+                        ocr_results = ocr_handler.detect_and_group_text(
+                            img,
+                            distance_threshold=bubble_threshold,
+                            preprocess_mode=ocr_preprocess
+                        )
+                        for bbox, text in ocr_results:
+                            # Überspringe leere oder sehr kurze Texte
+                            if len(text.strip()) < 2:
+                                continue
+                            # Übersetze den gesamten gruppierten Text
+                            translated_text = translator.translate_text(text)
+                            text_regions.append((bbox, text, translated_text))
+                            if debug_mode:
+                                all_text_data.append({"Page": original_page_num, "Original": text, "Translated": translated_text, "Type": "OCR"})
+                    # 4. Image Processing (Common for both modes)
+                    if show_boxes:
+                        # Nur Rahmen zeichnen ohne Text zu ersetzen
+                        processed_img = image_processor.draw_boxes_only(img.copy(), text_regions)
+                    else:
+                        processed_img = image_processor.overlay_text(img.copy(), text_regions)
+                    processed_images.append(processed_img)
+                    progress_bar.progress((i + 1) / len(images))
+                # 5. Save Result
+                if not processed_images:
+                    st.error("No pages were processed. No PDF generated.")
+                    st.session_state.translation_in_progress = False
+                    st.session_state.stop_translation = False
+                    return
+                output_pdf_path = "translated_manga.pdf"
+                pdf_handler.save_images_as_pdf(processed_images, output_pdf_path)
+                # Check if translation was stopped
+                if st.session_state.get('stop_translation', False):
+                    st.warning(f"⏹️ Translation stopped. Saved {len(processed_images)} out of {len(selected_indices)} pages.")
+                else:
+                    st.success("Translation Complete!")
+                # Display Cost Stats if available
+                if hasattr(translator, 'get_usage_stats'):
+                    stats = translator.get_usage_stats()
+                    if stats['input_tokens'] > 0:
+                        st.divider()
+                        st.subheader("📊 Cost & Usage Estimate")
+                        col_cost1, col_cost2, col_cost3 = st.columns(3)
+                        col_cost1.metric("Input Tokens", f"{stats['input_tokens']:,}")
+                        col_cost2.metric("Output Tokens", f"{stats['output_tokens']:,}")
+                        cost = translator.get_cost_estimate()
+                        col_cost3.metric("Estimated Cost", f"${cost:.4f}")
+                        st.caption("Note: Cost estimate based on GPT-4o-mini pricing ($0.15/$0.60 per 1M tokens).")
+                if debug_mode and all_text_data:
+                    st.divider()
+                    st.subheader("🐛 Debug: OCR & Translation Data")
+                    st.dataframe(all_text_data, width="stretch")
+                # Download Button
+                with open(output_pdf_path, "rb") as f:
+                    pdf_data = f.read()
+                    st.download_button(
+                        label="Download Translated PDF",
+                        data=pdf_data,
+                        file_name="translated_manga.pdf",
+                        mime="application/pdf"
+                    )
+                    # Show Preview Images (More reliable than PDF iframe)
+                    st.divider()
+                    st.markdown("### 👀 Preview (Processed Pages)")
+                    for i, p_img in enumerate(processed_images):
+                        st.image(p_img, caption=f"Translated Page {selected_indices[i] + 1}", width="stretch")
+                    # Reset translation flags
+                    st.session_state.translation_in_progress = False
+                    st.session_state.stop_translation = False
+            except Exception as e:
+                st.error(f"An error occurred: {e}")
+                st.session_state.translation_in_progress = False
+                st.session_state.stop_translation = False
+            finally:
+                # Cleanup - don't remove temp file, we might need it for retries
+                pass
+if __name__ == "__main__":
+    main()

packages.txt ADDED Viewed

	@@ -0,0 +1,3 @@

+libgl1
+libgl1-mesa-glx
+libglib2.0-0

pages/config.py ADDED Viewed

	@@ -0,0 +1,327 @@

+import streamlit as st
+import os
+import tempfile
+from PIL import Image, ImageDraw, ImageFont
+import numpy as np
+from src.pdf_handler import PDFHandler
+st.set_page_config(page_title="Configuration", page_icon="⚙️", layout="wide")
+st.title("⚙️ Configuration")
+# Initialize PDF handler globally
+pdf_handler = PDFHandler()
+# Session state initialization
+if 'config_pdf_path' not in st.session_state:
+    st.session_state.config_pdf_path = None
+if 'config_previews' not in st.session_state:
+    st.session_state.config_previews = []
+if 'selected_page' not in st.session_state:
+    st.session_state.selected_page = None
+if 'ocr_cache' not in st.session_state:
+    st.session_state.ocr_cache = {}
+if 'high_res_images' not in st.session_state:
+    st.session_state.high_res_images = {}
+# --- Global Settings (Session State) ---
+if 'stored_deepl_key' not in st.session_state:
+    st.session_state.stored_deepl_key = ""
+if 'stored_openai_key' not in st.session_state:
+    st.session_state.stored_openai_key = ""
+if 'stored_xai_key' not in st.session_state:
+    st.session_state.stored_xai_key = ""
+if 'translation_service_selection' not in st.session_state:
+    st.session_state.translation_service_selection = "OpenAI GPT-4o-mini (API Key - Recommended)"
+if 'debug_mode_checkbox' not in st.session_state:
+    st.session_state.debug_mode_checkbox = True
+if 'show_boxes_checkbox' not in st.session_state:
+    st.session_state.show_boxes_checkbox = False
+if 'bubble_threshold_setting' not in st.session_state:
+    st.session_state.bubble_threshold_setting = 160
+if 'ocr_engine_selection' not in st.session_state:
+    st.session_state.ocr_engine_selection = "magi"
+if 'ocr_preprocess_mode' not in st.session_state:
+    st.session_state.ocr_preprocess_mode = "gentle"
+# Create tabs for different configuration sections
+tab_general, tab_ocr_tool = st.tabs(["🌍 General Settings", "🔧 OCR Tool"])
+with tab_general:
+    st.header("Global Application Settings")
+    col1, col2 = st.columns(2)
+    with col1:
+        st.subheader("Translation Service")
+        service_options = [
+            "OpenAI GPT-4o-mini (API Key - Recommended)",
+            "Google Translate (Free - Experimental)",
+            "DeepL (API Key - Experimental)",
+            "xAI Grok (API Key - Experimental)",
+            "xAI Grok Vision (No OCR - Experimental)"
+        ]
+        st.selectbox(
+            "Select Translation Service",
+            options=service_options,
+            key='translation_service_selection'
+        )
+        service_choice = st.session_state.translation_service_selection
+        if "DeepL" in service_choice:
+            st.text_input("DeepL API Key", type="password", key="stored_deepl_key", help="Paste your DeepL API Key here.")
+        elif "OpenAI" in service_choice:
+            st.text_input("OpenAI API Key", type="password", key="stored_openai_key", help="Paste your OpenAI API Key here.")
+        elif "xAI" in service_choice:
+            st.text_input("xAI API Key", type="password", key="stored_xai_key", help="Paste your xAI API Key here.")
+    with col2:
+        st.subheader("Debug & Display Options")
+        st.checkbox("Debug Mode", help="Show OCR text vs. Translation table.", key="debug_mode_checkbox")
+        st.checkbox("Show OCR Boxes", help="Zeigt nur die erkannten Textbereiche als Rahmen.", key="show_boxes_checkbox")
+        st.slider(
+            "Bubble Grouping Distance (Global)",
+            min_value=30,
+            max_value=300,
+            step=10,
+            key="bubble_threshold_setting",
+            help="Maximaler Abstand (Pixel) um Textzeilen zu einer Sprechblase zusammenzufassen. Höher = mehr Gruppierung."
+        )
+with tab_ocr_tool:
+    st.header("OCR Configuration & Testing Tool")
+    # OCR Settings at the top
+    st.subheader("🔧 Global OCR Settings")
+    col_ocr1, col_ocr2 = st.columns(2)
+    with col_ocr1:
+        st.selectbox(
+            "OCR Engine",
+            options=['magi', 'manga-ocr', 'paddleocr', 'easyocr'],
+            key='ocr_engine_selection',
+            help="'magi' = best for manga (detects speech bubbles) [DEFAULT], 'manga-ocr' = specialized for manga fonts, 'paddleocr' = fast and general purpose, 'easyocr' = multi-language support"
+        )
+    with col_ocr2:
+        st.selectbox(
+            "OCR Preprocessing",
+            options=['gentle', 'none', 'aggressive'],
+            key='ocr_preprocess_mode',
+            help="'gentle' = recommended for manga, 'none' = original image, 'aggressive' = strong binarization"
+        )
+    st.divider()
+    st.subheader("📄 Test OCR on PDF Pages")
+    st.markdown("Lade ein PDF hoch, klicke auf eine Seite, und passe den Threshold an um die Sprechblasen-Erkennung zu optimieren.")
+    def draw_boxes(image: Image.Image, text_results):
+        """Zeichnet farbige Boxen auf das Bild"""
+        img_copy = image.copy()
+        draw = ImageDraw.Draw(img_copy)
+        # Lade Font
+        try:
+            font = ImageFont.truetype("Arial.ttf", 14)
+        except:
+            try:
+                font = ImageFont.truetype("/System/Library/Fonts/Helvetica.ttc", 14)
+            except:
+                font = ImageFont.load_default()
+        colors = ["#FF0000", "#0066FF", "#00CC00", "#FF9900", "#9900FF", "#00CCCC", "#FF00FF", "#FFCC00"]
+        for i, item in enumerate(text_results):
+            bbox = item[0]
+            text = item[1] if len(item) > 1 else ""
+            pts = np.array(bbox)
+            x_min = int(np.min(pts[:, 0]))
+            y_min = int(np.min(pts[:, 1]))
+            x_max = int(np.max(pts[:, 0]))
+            y_max = int(np.max(pts[:, 1]))
+            box_color = colors[i % len(colors)]
+            # Dicken Rahmen zeichnen
+            for offset in range(4):
+                draw.rectangle(
+                    [x_min - offset, y_min - offset, x_max + offset, y_max + offset],
+                    outline=box_color
+                )
+            # Hintergrund für Label
+            label = f"[{i+1}]"
+            draw.rectangle([x_min, y_min - 20, x_min + 30, y_min], fill=box_color)
+            draw.text((x_min + 2, y_min - 18), label, fill="white", font=font)
+        return img_copy
+    def select_page(page_num):
+        st.session_state.selected_page = page_num
+    # PDF Upload
+    uploaded_pdf = st.file_uploader("📄 PDF hochladen (für OCR Config)", type=["pdf"])
+    if uploaded_pdf:
+        # Check if new file
+        file_id = f"{uploaded_pdf.name}_{uploaded_pdf.size}"
+        if st.session_state.config_pdf_path is None or not os.path.exists(st.session_state.config_pdf_path):
+            # Save temp PDF
+            with tempfile.NamedTemporaryFile(delete=False, suffix=".pdf") as tmp:
+                tmp.write(uploaded_pdf.read())
+                st.session_state.config_pdf_path = tmp.name
+            # Generate previews
+            with st.spinner("Lade Seiten-Vorschau..."):
+                st.session_state.config_previews = pdf_handler.extract_images_from_pdf(
+                    st.session_state.config_pdf_path, zoom=0.8
+                )
+            st.session_state.selected_page = None
+            st.session_state.ocr_cache = {}
+            st.session_state.high_res_images = {}
+        if st.session_state.config_previews:
+            if st.session_state.selected_page is None:
+                # Show page grid
+                st.subheader("📖 Seite auswählen")
+                num_cols = 5
+                cols = st.columns(num_cols)
+                for i, preview in enumerate(st.session_state.config_previews):
+                    with cols[i % num_cols]:
+                        st.markdown(f"Seite {i+1}")
+                        st.image(preview, width="stretch")
+                        st.button(
+                            f"Auswählen",
+                            key=f"select_page_{i}",
+                            on_click=select_page,
+                            args=(i,),
+                            type="secondary"
+                        )
+            # If page selected, show OCR config
+            if st.session_state.selected_page is not None:
+                if st.button("← Zurück zur Übersicht"):
+                    st.session_state.selected_page = None
+                    st.rerun()
+                st.divider()
+                page_idx = st.session_state.selected_page
+                # Sidebar controls for OCR Tool
+                st.sidebar.header("🔧 OCR Tool Einstellungen")
+                # Local threshold for this tool, defaulting to global setting
+                tool_bubble_threshold = st.sidebar.slider(
+                    "Bubble Grouping Distance (Test)",
+                    min_value=30,
+                    max_value=400,
+                    value=st.session_state.bubble_threshold_setting,
+                    step=10,
+                    help="Test-Wert für diesen Viewer. Ändert nicht die globale Einstellung."
+                )
+                ocr_engine = st.sidebar.selectbox(
+                    "OCR Engine",
+                    options=['magi', 'manga-ocr', 'paddleocr', 'easyocr'],
+                    index=0,
+                    help="'magi' = beste für Manga (erkennt Sprechblasen), 'manga-ocr' = gut, 'paddleocr' = schnell"
+                )
+                preprocess_mode = st.sidebar.selectbox(
+                    "OCR Preprocessing",
+                    options=['gentle', 'none', 'aggressive'],
+                    index=0,
+                    help="'gentle' = empfohlen für Manga, 'none' = Originalbild, 'aggressive' = starke Binarisierung"
+                )
+                show_raw = st.sidebar.checkbox("Zeige Roh-OCR zum Vergleich", value=False)
+                st.sidebar.divider()
+                st.sidebar.info("💡 **Tipps:**\n- Magi ist am besten für Manga\n- Gentle preprocessing empfohlen")
+                # Load high-res image for selected page
+                cache_key = f"page_{page_idx}"
+                if cache_key not in st.session_state.high_res_images:
+                    with st.spinner(f"Lade Seite {page_idx + 1} in hoher Auflösung..."):
+                        high_res = pdf_handler.extract_images_from_pdf(
+                            st.session_state.config_pdf_path,
+                            pages=[page_idx],
+                            zoom=2
+                        )
+                        if high_res:
+                            st.session_state.high_res_images[cache_key] = high_res[0]
+                if cache_key in st.session_state.high_res_images:
+                    image = st.session_state.high_res_images[cache_key]
+                    # Run OCR (cached per page, engine, preprocess mode)
+                    ocr_key = f"ocr_{page_idx}_{ocr_engine}_{preprocess_mode}"
+                    if ocr_key not in st.session_state.ocr_cache:
+                        with st.spinner(f"🔍 Analysiere Text mit {ocr_engine.upper()}..."):
+                            # Lazy load OCR handler here to avoid circular imports if any
+                            from src.ocr_handler import OCRHandler
+                            ocr_handler_tool = OCRHandler(lang_list=['en'], gpu=False, ocr_engine=ocr_engine)
+                            raw_results = ocr_handler_tool.detect_text(
+                                image,
+                                paragraph=False,
+                                preprocess_mode=preprocess_mode
+                            )
+                            st.session_state.ocr_cache[ocr_key] = raw_results
+                    raw_results = st.session_state.ocr_cache[ocr_key]
+                    # Group with current threshold
+                    from src.ocr_handler import OCRHandler
+                    ocr_handler_tool = OCRHandler(lang_list=['en'], gpu=False, ocr_engine=ocr_engine)
+                    grouped_results = ocr_handler_tool.group_text_into_bubbles(raw_results, distance_threshold=tool_bubble_threshold)
+                    # Display
+                    st.subheader(f"📄 Seite {page_idx + 1} - OCR Ergebnis")
+                    if show_raw:
+                        col1, col2 = st.columns(2)
+                        with col1:
+                            st.markdown(f"**🔴 Roh-OCR: {len(raw_results)} Boxen**")
+                            raw_image = draw_boxes(image, raw_results)
+                            st.image(raw_image, width="stretch")
+                        with col2:
+                            st.markdown(f"**🟢 Gruppiert: {len(grouped_results)} Boxen** (Threshold: {tool_bubble_threshold}px)")
+                            grouped_image = draw_boxes(image, grouped_results)
+                            st.image(grouped_image, width="stretch")
+                    else:
+                        st.markdown(f"**🟢 Gruppiert: {len(grouped_results)} Boxen** (Threshold: {tool_bubble_threshold}px)")
+                        grouped_image = draw_boxes(image, grouped_results)
+                        st.image(grouped_image, width="stretch")
+                    # Show detected texts
+                    with st.expander(f"📝 Erkannte Texte ({len(grouped_results)} Gruppen)", expanded=True):
+                        for i, item in enumerate(grouped_results):
+                            text = item[1] if len(item) > 1 else ""
+                            colors = ["🔴", "🔵", "🟢", "🟠", "🟣", "🩵", "🩷", "🟡"]
+                            color = colors[i % len(colors)]
+                            st.markdown(f"{color} **[{i+1}]** {text}")
+                    # Stats
+                    st.divider()
+                    col_stat1, col_stat2, col_stat3 = st.columns(3)
+                    col_stat1.metric("Roh-Boxen", len(raw_results))
+                    col_stat2.metric("Gruppierte Boxen", len(grouped_results))
+                    reduction = 100 - (len(grouped_results) / max(len(raw_results), 1) * 100)
+                    col_stat3.metric("Reduktion", f"{reduction:.0f}%")
+    else:
+        st.info("👆 Lade ein PDF hoch um die OCR-Boxen zu konfigurieren.")

requirements-optional.txt ADDED Viewed

	@@ -0,0 +1,12 @@

+# Optional OCR Engines
+# Install only the ones you want to use:
+# Manga-OCR (specialized for manga/comic fonts)
+manga-ocr
+# PaddleOCR (fast and general purpose)
+paddlepaddle
+paddleocr
+# EasyOCR (multi-language support)
+easyocr

requirements.txt ADDED Viewed

	@@ -0,0 +1,22 @@

+streamlit
+PyMuPDF
+deep-translator
+opencv-python-headless
+Pillow
+numpy
+scipy
+certifi
+openai
+deepl
+# Default OCR Engine (Magi - best for manga)
+torch
+torchvision
+transformers
+einops
+timm
+matplotlib
+watchdog
+# Geometry / spatial dependencies used by Magi
+shapely

run.sh ADDED Viewed

	@@ -0,0 +1,10 @@

+#!/bin/bash
+# Navigate to the project directory
+cd /Users/christoph/Dokumente/entwicklung/mangatranslator
+# Activate the virtual environment
+source venv/bin/activate
+# Run the Streamlit app
+streamlit run app.py

setup.bat ADDED Viewed

	@@ -0,0 +1,68 @@

+@echo off
+REM Manga Translator Setup Script for Windows
+REM ==========================================
+echo.
+echo 📚 Manga Translator Setup (Windows)
+echo ====================================
+echo.
+REM Check Python
+python --version >nul 2>&1
+if errorlevel 1 (
+    echo ❌ Python not found! Install Python 3.10+ from https://python.org
+    pause
+    exit /b 1
+)
+REM Check for Tesseract
+where tesseract >nul 2>&1
+if errorlevel 1 (
+    echo.
+    echo ⚠️  Tesseract OCR not found!
+    echo.
+    echo Please install manually:
+    echo 1. Download from: https://github.com/UB-Mannheim/tesseract/wiki
+    echo 2. Run installer and add to PATH
+    echo 3. Re-run this script
+    echo.
+    pause
+    exit /b 1
+)
+echo ✅ Tesseract found
+REM Create virtual environment
+echo.
+echo 🐍 Creating Python virtual environment...
+python -m venv venv
+REM Activate virtual environment
+echo 🔌 Activating virtual environment...
+call venv\Scripts\activate.bat
+REM Upgrade pip
+echo.
+echo 📦 Upgrading pip...
+python -m pip install --upgrade pip
+REM Install PyTorch first (CPU version for compatibility)
+echo.
+echo 📦 Installing PyTorch (CPU)...
+pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu
+REM Install other dependencies
+echo.
+echo 📦 Installing remaining packages (this may take a few minutes)...
+pip install -r requirements.txt
+echo.
+echo ✅ Setup complete!
+echo.
+echo To start the app:
+echo   venv\Scripts\activate.bat
+echo   streamlit run app.py
+echo.
+echo Then open: http://localhost:8501
+echo.
+pause

setup.sh ADDED Viewed

	@@ -0,0 +1,40 @@

+#!/bin/bash
+# Manga Translator Setup Script
+# ==============================
+echo "📚 Manga Translator Setup"
+echo "========================="
+# Check OS (currently no extra system packages are required)
+if [[ "$OSTYPE" == "darwin"* ]]; then
+    echo "🍎 macOS detected"
+elif [[ "$OSTYPE" == "linux-gnu"* ]]; then
+    echo "🐧 Linux detected"
+fi
+# Create virtual environment
+echo ""
+echo "🐍 Creating Python virtual environment..."
+python3 -m venv venv
+# Activate virtual environment
+echo "🔌 Activating virtual environment..."
+source venv/bin/activate
+# Install Python dependencies
+echo ""
+echo "📦 Installing Python packages (this may take a few minutes)..."
+pip install --upgrade pip
+pip install -r requirements.txt
+echo ""
+echo "✅ Setup complete!"
+echo ""
+echo "To start the app:"
+echo "  source venv/bin/activate"
+echo "  streamlit run app.py"
+echo ""
+echo "Then open: http://localhost:8501"
+echo ""
+echo "OCR Config Page: http://localhost:8501/config"

src/__init__.py ADDED Viewed

File without changes

src/image_processor.py ADDED Viewed

	@@ -0,0 +1,171 @@

+from PIL import Image, ImageDraw, ImageFont
+import numpy as np
+from typing import List, Tuple
+class ImageProcessor:
+    def __init__(self):
+        pass
+    def draw_boxes_only(self, image: Image.Image, text_regions: List[Tuple[List[List[int]], str, str]]) -> Image.Image:
+        """
+        Zeichnet nur rote Rahmen um die erkannten Textbereiche (ohne Text zu ersetzen).
+        Args:
+            image: The original PIL Image.
+            text_regions: List of tuples (bbox, original_text, translated_text).
+        Returns:
+            Image with red boxes drawn around text regions.
+        """
+        draw = ImageDraw.Draw(image)
+        for bbox, original, translated in text_regions:
+            # Calculate bounding rectangle
+            pts = np.array(bbox)
+            x_min = int(np.min(pts[:, 0]))
+            y_min = int(np.min(pts[:, 1]))
+            x_max = int(np.max(pts[:, 0]))
+            y_max = int(np.max(pts[:, 1]))
+            # Draw red rectangle outline (3px thick)
+            for offset in range(3):
+                draw.rectangle(
+                    [x_min - offset, y_min - offset, x_max + offset, y_max + offset],
+                    outline="red"
+                )
+            # Draw text label above box
+            try:
+                font = self._load_font(12)
+            except:
+                font = ImageFont.load_default()
+            # Truncate text if too long
+            label = original[:50] + "..." if len(original) > 50 else original
+            draw.text((x_min, y_min - 15), label, fill="red", font=font)
+        return image
+    def overlay_text(self, image: Image.Image, text_regions: List[Tuple[List[List[int]], str, str]]) -> Image.Image:
+        """
+        Overlays translated text onto the image.
+        Args:
+            image: The original PIL Image.
+            text_regions: List of tuples (bbox, original_text, translated_text).
+                         bbox is [[x1,y1], [x2,y2], [x3,y3], [x4,y4]].
+        Returns:
+            Processed PIL Image.
+        """
+        draw = ImageDraw.Draw(image)
+        for bbox, original, translated in text_regions:
+            # Calculate bounding rectangle
+            pts = np.array(bbox)
+            x_min = int(np.min(pts[:, 0]))
+            y_min = int(np.min(pts[:, 1]))
+            x_max = int(np.max(pts[:, 0]))
+            y_max = int(np.max(pts[:, 1]))
+            # Draw white rectangle (inpainting)
+            draw.rectangle([x_min, y_min, x_max, y_max], fill="white", outline="white")
+            # Calculate box dimensions
+            box_width = x_max - x_min
+            box_height = y_max - y_min
+            # Draw text
+            self._draw_text_in_box(draw, translated, x_min, y_min, box_width, box_height)
+        return image
+    def _draw_text_in_box(self, draw: ImageDraw.ImageDraw, text: str, x: int, y: int, w: int, h: int):
+        """
+        Fits text inside a box by iteratively reducing font size and wrapping.
+        """
+        import textwrap
+        # Minimum legible font size
+        min_fontsize = 8
+        start_fontsize = 18 # Start ambitious
+        padding = 4
+        available_w = max(1, w - 2*padding)
+        available_h = max(1, h - 2*padding)
+        best_font = None
+        best_wrapped_text = text
+        # Iteratively try to fit text
+        for fontsize in range(start_fontsize, min_fontsize - 1, -2):
+            try:
+                # Load font
+                font = self._load_font(fontsize)
+                # Estimate char width (heuristic: usually ~0.6 * fontsize for proportional fonts)
+                # A better way is to measure 'x' or 'M'
+                bbox = font.getbbox("M")
+                char_w = bbox[2] - bbox[0] if bbox else fontsize * 0.6
+                # Calculate max chars per line
+                chars_per_line = max(1, int(available_w / char_w))
+                # Wrap text
+                # break_long_words=False ensures we don't split words like "Unbelievable" into "Unbelievab-le"
+                # Instead, if a word is too long, the width check below will fail, and we'll try a smaller font.
+                wrapped_text = textwrap.fill(text, width=chars_per_line, break_long_words=False)
+                # Measure total height
+                # getbbox returns (left, top, right, bottom)
+                # For multiline, we need to rely on draw.multiline_textbbox if available (Pillow 8.0+)
+                if hasattr(draw, 'multiline_textbbox'):
+                    text_bbox = draw.multiline_textbbox((0,0), wrapped_text, font=font)
+                    text_h = text_bbox[3] - text_bbox[1]
+                    text_w = text_bbox[2] - text_bbox[0]
+                else:
+                    # Fallback for older Pillow
+                    text_w, text_h = draw.textsize(wrapped_text, font=font)
+                # Check if fits vertically and horizontally (roughly)
+                if text_h <= available_h and text_w <= available_w * 1.1: # Allow slight overflow width-wise due to wrap inaccuracy
+                    best_font = font
+                    best_wrapped_text = wrapped_text
+                    break # Found a fit!
+            except Exception as e:
+                print(f"Font fitting error: {e}")
+                continue
+        # If loop finishes without break, we use the smallest font (last one tried)
+        if best_font is None:
+             best_font = self._load_font(min_fontsize)
+             # Re-wrap for min font
+             bbox = best_font.getbbox("M")
+             char_w = bbox[2] - bbox[0] if bbox else min_fontsize * 0.6
+             chars_per_line = max(1, int(available_w / char_w))
+             best_wrapped_text = textwrap.fill(text, width=chars_per_line)
+        # Center text vertically
+        if hasattr(draw, 'multiline_textbbox'):
+            final_bbox = draw.multiline_textbbox((0,0), best_wrapped_text, font=best_font)
+            final_h = final_bbox[3] - final_bbox[1]
+        else:
+            _, final_h = draw.textsize(best_wrapped_text, font=best_font)
+        center_y = y + (h - final_h) // 2
+        center_y = max(y, center_y) # Don't go above box
+        # Draw text (black)
+        draw.multiline_text((x + padding, center_y), best_wrapped_text, fill="black", font=best_font, align="center")
+    def _load_font(self, fontsize: int):
+        """Helper to load a font with fallback"""
+        font_names = ["Arial.ttf", "/System/Library/Fonts/Helvetica.ttc", "/System/Library/Fonts/Supplemental/Arial.ttf", "DejaVuSans.ttf"]
+        for name in font_names:
+            try:
+                return ImageFont.truetype(name, fontsize)
+            except:
+                continue
+        return ImageFont.load_default()

src/ocr_handler.py ADDED Viewed

	@@ -0,0 +1,434 @@

+import numpy as np
+import cv2
+from PIL import Image
+from typing import List, Tuple, Any, Union, Optional
+from scipy.spatial.distance import cdist
+from scipy.cluster.hierarchy import fcluster, linkage
+class OCRHandler:
+    def __init__(self, lang_list: List[str] = ['en'], gpu: bool = False, ocr_engine: str = 'magi'):
+        """
+        Initializes the OCR handler with lazy loading.
+        Args:
+            lang_list: List of languages to detect (default: ['en']).
+            gpu: Boolean to enable GPU usage (default: False).
+            ocr_engine: 'magi' (default), 'manga-ocr', 'paddleocr', or 'easyocr'.
+        """
+        self.ocr_engine = ocr_engine
+        self.lang_list = lang_list
+        self.gpu = gpu
+        # Lazy loading - modules are loaded on first use
+        self._magi_model = None
+        self._manga_ocr = None
+        self._detector = None
+        self._paddle_reader = None
+        self._easy_reader = None
+        print(f"OCR Handler initialized with engine: {ocr_engine} (lazy loading enabled)")
+    def _load_magi(self):
+        """Lazy load Magi model."""
+        if self._magi_model is None:
+            print("Loading Magi (The Manga Whisperer)...")
+            try:
+                from transformers import AutoModel
+                import torch
+                self._magi_model = AutoModel.from_pretrained("ragavsachdeva/magi", trust_remote_code=True)
+                if torch.cuda.is_available() and self.gpu:
+                    self._magi_model = self._magi_model.cuda()
+                self._magi_model.eval()
+                print("✓ Magi loaded successfully")
+            except ImportError as e:
+                raise ImportError(
+                    "Magi dependencies not installed. "
+                    "This should not happen as Magi is the default engine. "
+                    f"Error: {e}"
+                )
+        return self._magi_model
+    def _load_manga_ocr(self):
+        """Lazy load Manga-OCR."""
+        if self._manga_ocr is None:
+            print("Loading Manga-OCR...")
+            try:
+                from manga_ocr import MangaOcr
+                from paddleocr import PaddleOCR
+                self._manga_ocr = MangaOcr()
+                self._detector = PaddleOCR(lang='en', show_log=False)
+                print("✓ Manga-OCR loaded successfully")
+            except ImportError:
+                raise ImportError(
+                    "Manga-OCR not installed. Install with:\n"
+                    "pip install -r requirements-optional.txt\n"
+                    "or: pip install manga-ocr paddlepaddle paddleocr"
+                )
+        return self._manga_ocr, self._detector
+    def _load_paddleocr(self):
+        """Lazy load PaddleOCR."""
+        if self._paddle_reader is None:
+            print("Loading PaddleOCR...")
+            try:
+                from paddleocr import PaddleOCR
+                self._paddle_reader = PaddleOCR(lang='en', show_log=False)
+                print("✓ PaddleOCR loaded successfully")
+            except ImportError:
+                raise ImportError(
+                    "PaddleOCR not installed. Install with:\n"
+                    "pip install paddlepaddle paddleocr"
+                )
+        return self._paddle_reader
+    def _load_easyocr(self):
+        """Lazy load EasyOCR."""
+        if self._easy_reader is None:
+            print("Loading EasyOCR (this may take a while on first run)...")
+            try:
+                import easyocr
+                self._easy_reader = easyocr.Reader(self.lang_list, gpu=self.gpu)
+                print("✓ EasyOCR loaded successfully")
+            except ImportError:
+                raise ImportError(
+                    "EasyOCR not installed. Install with:\n"
+                    "pip install easyocr"
+                )
+        return self._easy_reader
+    def preprocess_image(self, image: np.ndarray, mode: str = 'gentle') -> np.ndarray:
+        """
+        Applies preprocessing to improve OCR quality.
+        Args:
+            image: Input image as numpy array (RGB).
+            mode: Preprocessing mode:
+                  - 'none': No preprocessing, use original image
+                  - 'gentle': Light preprocessing (recommended for manga)
+                  - 'aggressive': Heavy preprocessing (old behavior)
+        """
+        if mode == 'none':
+            # Scale up 3x for better recognition of thin characters like "I"
+            return cv2.resize(image, None, fx=3, fy=3, interpolation=cv2.INTER_CUBIC)
+        # Convert to grayscale
+        gray = cv2.cvtColor(image, cv2.COLOR_RGB2GRAY)
+        # Scaling (2x) - helpful for small text
+        scaled = cv2.resize(gray, None, fx=2, fy=2, interpolation=cv2.INTER_CUBIC)
+        if mode == 'gentle':
+            # Gentle preprocessing - preserve thin strokes like "I", "l", etc.
+            # Light contrast enhancement instead of harsh binarization
+            clahe = cv2.createCLAHE(clipLimit=2.0, tileGridSize=(8, 8))
+            enhanced = clahe.apply(scaled)
+            # Very light denoising to preserve details
+            denoised = cv2.fastNlMeansDenoising(enhanced, h=5, templateWindowSize=7, searchWindowSize=21)
+            return denoised
+        else:  # aggressive
+            # Denoising
+            denoised = cv2.fastNlMeansDenoising(scaled, h=10, templateWindowSize=7, searchWindowSize=21)
+            # Thresholding (Binarization) - can destroy thin characters!
+            _, binary = cv2.threshold(denoised, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)
+            return binary
+    def detect_text(self, image: Union[Image.Image, np.ndarray], paragraph: bool = True, preprocess_mode: str = 'gentle', tesseract_psm: int = 6, tesseract_confidence: int = 60) -> List[Tuple[List[Tuple[int, int]], str]]:
+        """
+        Detects text in an image.
+        Args:
+            image: PIL Image or numpy array.
+            paragraph: If True, combines text lines into paragraphs (better for translation).
+            preprocess_mode: Preprocessing mode ('gentle', 'none', 'aggressive').
+        Returns:
+            List of tuples: (bounding_box, text) or (bounding_box, text, confidence)
+        """
+        if isinstance(image, Image.Image):
+            image = np.array(image)
+        # Apply preprocessing for detection
+        processed_image = self.preprocess_image(image, mode=preprocess_mode)
+        # Scale factor depends on preprocessing mode
+        scale_factor = 3 if preprocess_mode == 'none' else 2
+        if self.ocr_engine == 'magi':
+            return self._detect_with_magi(processed_image, scale_factor)
+        elif self.ocr_engine == 'manga-ocr':
+            return self._detect_with_manga_ocr(processed_image, scale_factor)
+        elif self.ocr_engine == 'paddleocr':
+            return self._detect_with_paddleocr(processed_image, scale_factor)
+        elif self.ocr_engine == 'easyocr':
+            return self._detect_with_easyocr(processed_image, paragraph, scale_factor)
+        else:
+            raise ValueError(f"Unknown OCR engine: {self.ocr_engine}")
+    def _detect_with_magi(self, processed_image: np.ndarray, scale_factor: int) -> List[Tuple]:
+        """Detect text using Magi - The Manga Whisperer (best for manga)."""
+        import torch
+        model = self._load_magi()
+        # Magi expects RGB numpy array
+        if len(processed_image.shape) == 2:
+            # Grayscale to RGB
+            processed_image = np.stack([processed_image] * 3, axis=-1)
+        with torch.no_grad():
+            # Detect text boxes
+            results = model.predict_detections_and_associations([processed_image])
+            text_bboxes = [results[0]["texts"]]
+            # Run OCR on detected text boxes
+            ocr_results = model.predict_ocr([processed_image], text_bboxes)
+        final_results = []
+        if results and len(results) > 0:
+            text_boxes = results[0].get("texts", [])
+            ocr_texts = ocr_results[0] if ocr_results else []
+            for i, bbox in enumerate(text_boxes):
+                # bbox format: [x1, y1, x2, y2]
+                x1, y1, x2, y2 = bbox
+                # Convert to 4-point format and scale back
+                bbox_4pt = [
+                    [int(x1 / scale_factor), int(y1 / scale_factor)],
+                    [int(x2 / scale_factor), int(y1 / scale_factor)],
+                    [int(x2 / scale_factor), int(y2 / scale_factor)],
+                    [int(x1 / scale_factor), int(y2 / scale_factor)]
+                ]
+                # Get OCR text if available
+                text = ocr_texts[i] if i < len(ocr_texts) else ""
+                if text.strip():
+                    final_results.append((bbox_4pt, text.strip(), 0.95))
+        return final_results
+    def _detect_with_manga_ocr(self, processed_image: np.ndarray, scale_factor: int) -> List[Tuple]:
+        """Detect text using Manga-OCR - specialized for manga/comic fonts."""
+        manga_ocr, detector = self._load_manga_ocr()
+        # Use PaddleOCR for detection, then manga-ocr for recognition
+        detection_result = detector.ocr(processed_image, cls=False)
+        final_results = []
+        if detection_result and detection_result[0]:
+            for item in detection_result[0]:
+                # PaddleOCR returns [[bbox], (text, conf)] - we only need bbox
+                if isinstance(item[0], list) and isinstance(item[0][0], (list, tuple)):
+                    bbox_raw = item[0]  # [[x1,y1], [x2,y2], [x3,y3], [x4,y4]]
+                else:
+                    bbox_raw = item  # Already just bbox
+                pts = np.array(bbox_raw).astype(int)
+                x_min, y_min = pts.min(axis=0)
+                x_max, y_max = pts.max(axis=0)
+                # Ensure valid crop region
+                x_min = max(0, x_min)
+                y_min = max(0, y_min)
+                x_max = min(processed_image.shape[1], x_max)
+                y_max = min(processed_image.shape[0], y_max)
+                if x_max <= x_min or y_max <= y_min:
+                    continue
+                # Crop the text region
+                cropped = processed_image[y_min:y_max, x_min:x_max]
+                if cropped.size == 0:
+                    continue
+                # Convert to PIL for manga-ocr
+                cropped_pil = Image.fromarray(cropped)
+                # Recognize with manga-ocr
+                try:
+                    text = manga_ocr(cropped_pil)
+                except Exception as e:
+                    print(f"Manga-OCR error: {e}")
+                    continue
+                if not text.strip():
+                    continue
+                # Scale bbox back
+                bbox = [[int(p[0]/scale_factor), int(p[1]/scale_factor)] for p in bbox_raw]
+                final_results.append((bbox, text.strip(), 0.95))
+        return final_results
+    def _detect_with_paddleocr(self, processed_image: np.ndarray, scale_factor: int) -> List[Tuple]:
+        """Detect text using PaddleOCR - fast and general purpose."""
+        reader = self._load_paddleocr()
+        # PaddleOCR expects BGR or RGB numpy array
+        result = reader.ocr(processed_image, cls=True)
+        final_results = []
+        # PaddleOCR returns: [[[box], (text, confidence)], ...]
+        if result and result[0]:
+            for line in result[0]:
+                bbox_raw = line[0]  # [[x1,y1], [x2,y2], [x3,y3], [x4,y4]]
+                text = line[1][0]
+                confidence = line[1][1]
+                # Skip empty or low confidence
+                if not text.strip() or confidence < 0.5:
+                    continue
+                # Scale bbox back
+                bbox = [[int(p[0]/scale_factor), int(p[1]/scale_factor)] for p in bbox_raw]
+                final_results.append((bbox, text.strip(), confidence))
+        return final_results
+    def _detect_with_easyocr(self, processed_image: np.ndarray, paragraph: bool, scale_factor: int) -> List[Tuple]:
+        """Detect text using EasyOCR."""
+        reader = self._load_easyocr()
+        results = reader.readtext(
+            processed_image,
+            paragraph=paragraph,
+            contrast_ths=0.05,
+            text_threshold=0.5,
+            low_text=0.2,
+            width_ths=0.5,
+            height_ths=0.5,
+            min_size=5,
+            rotation_info=[0],
+        )
+        final_results = []
+        for item in results:
+            if len(item) == 2:
+                bbox, text = item
+                new_bbox = [[int(p[0]/scale_factor), int(p[1]/scale_factor)] for p in bbox]
+                final_results.append((new_bbox, text))
+            elif len(item) == 3:
+                bbox, text, prob = item
+                new_bbox = [[int(p[0]/scale_factor), int(p[1]/scale_factor)] for p in bbox]
+                final_results.append((new_bbox, text, prob))
+        return final_results
+    def get_text_regions(self, image: Union[Image.Image, np.ndarray]) -> List[Any]:
+        """
+        Returns raw results from OCR.
+        """
+        return self.detect_text(image)
+    def group_text_into_bubbles(self, text_results: List[Tuple], distance_threshold: float = 50) -> List[Tuple[List[List[int]], str]]:
+        """
+        Gruppiert nahe beieinanderliegende Textblöcke zu Sprechblasen.
+        Args:
+            text_results: Liste von (bbox, text) Tupeln aus detect_text.
+            distance_threshold: Maximaler Abstand zwischen Textblöcken, um sie zu gruppieren.
+        Returns:
+            Liste von (merged_bbox, combined_text) Tupeln.
+        """
+        if not text_results or len(text_results) == 0:
+            return []
+        if len(text_results) == 1:
+            # Nur ein Textblock, direkt zurückgeben
+            bbox, text = text_results[0][:2]
+            return [(bbox, text)]
+        # Berechne Zentren aller Bounding Boxes
+        centers = []
+        for item in text_results:
+            bbox = item[0]
+            pts = np.array(bbox)
+            center_x = np.mean(pts[:, 0])
+            center_y = np.mean(pts[:, 1])
+            centers.append([center_x, center_y])
+        centers = np.array(centers)
+        # Hierarchisches Clustering basierend auf Distanz
+        if len(centers) > 1:
+            linkage_matrix = linkage(centers, method='average')
+            clusters = fcluster(linkage_matrix, t=distance_threshold, criterion='distance')
+        else:
+            clusters = [1]
+        # Gruppiere Textblöcke nach Cluster
+        cluster_groups = {}
+        for idx, cluster_id in enumerate(clusters):
+            if cluster_id not in cluster_groups:
+                cluster_groups[cluster_id] = []
+            cluster_groups[cluster_id].append(idx)
+        # Erstelle zusammengeführte Ergebnisse
+        merged_results = []
+        for cluster_id, indices in cluster_groups.items():
+            # Sammle alle Bboxes und Texte dieser Gruppe
+            all_bboxes = []
+            all_texts = []
+            # Sortiere nach Y-Position (oben nach unten)
+            sorted_indices = sorted(indices, key=lambda i: np.mean(np.array(text_results[i][0])[:, 1]))
+            for idx in sorted_indices:
+                item = text_results[idx]
+                bbox = item[0]
+                text = item[1]
+                all_bboxes.append(bbox)
+                all_texts.append(text)
+            # Kombiniere alle Bboxes zu einer großen Bbox
+            all_points = []
+            for bbox in all_bboxes:
+                all_points.extend(bbox)
+            all_points = np.array(all_points)
+            x_min = int(np.min(all_points[:, 0]))
+            y_min = int(np.min(all_points[:, 1]))
+            x_max = int(np.max(all_points[:, 0]))
+            y_max = int(np.max(all_points[:, 1]))
+            merged_bbox = [[x_min, y_min], [x_max, y_min], [x_max, y_max], [x_min, y_max]]
+            # Kombiniere Texte mit Leerzeichen (für natürlichen Lesefluss)
+            combined_text = ' '.join(all_texts)
+            merged_results.append((merged_bbox, combined_text))
+        return merged_results
+    def detect_and_group_text(self, image: Union[Image.Image, np.ndarray], distance_threshold: float = 50, preprocess_mode: str = 'gentle') -> List[Tuple[List[List[int]], str]]:
+        """
+        Erkennt Text und gruppiert ihn automatisch nach Sprechblasen.
+        Args:
+            image: PIL Image oder numpy array.
+            distance_threshold: Maximaler Abstand für Gruppierung (in Pixeln).
+            preprocess_mode: Preprocessing mode ('gentle', 'none', 'aggressive').
+        Returns:
+            Liste von (bbox, combined_text) Tupeln, gruppiert nach Sprechblasen.
+        """
+        # Erst einzelne Textblöcke erkennen (paragraph=False für feinere Kontrolle)
+        raw_results = self.detect_text(image, paragraph=False, preprocess_mode=preprocess_mode)
+        # Dann nach räumlicher Nähe gruppieren
+        grouped_results = self.group_text_into_bubbles(raw_results, distance_threshold)
+        return grouped_results

src/pdf_handler.py ADDED Viewed

	@@ -0,0 +1,80 @@

+import fitz  # PyMuPDF
+from PIL import Image
+import io
+import os
+from typing import List, Union
+class PDFHandler:
+    def __init__(self):
+        pass
+    def extract_images_from_pdf(self, pdf_path: str, zoom: int = 2, pages: List[int] = None) -> List[Image.Image]:
+        """
+        Converts each page of the PDF into a PIL Image.
+        Args:
+            pdf_path: Path to the source PDF file.
+            zoom: Zoom factor for higher resolution (default 2 for better OCR).
+            pages: Optional list of 0-indexed page numbers to extract. If None, extracts all.
+        Returns:
+            List of PIL Image objects.
+        """
+        if not os.path.exists(pdf_path):
+            raise FileNotFoundError(f"PDF file not found: {pdf_path}")
+        doc = fitz.open(pdf_path)
+        images = []
+        # Matrix for zooming (higher resolution for better OCR)
+        mat = fitz.Matrix(zoom, zoom)
+        # Determine which pages to process
+        if pages is None:
+            page_indices = range(len(doc))
+        else:
+            # Filter out invalid page numbers
+            page_indices = [p for p in pages if 0 <= p < len(doc)]
+        for page_num in page_indices:
+            page = doc.load_page(page_num)
+            pix = page.get_pixmap(matrix=mat)
+            # Convert to PIL Image
+            img_data = pix.tobytes("png")
+            img = Image.open(io.BytesIO(img_data))
+            images.append(img)
+        doc.close()
+        return images
+    def save_images_as_pdf(self, images: List[Image.Image], output_path: str):
+        """
+        Saves a list of PIL Images as a single PDF file.
+        Args:
+            images: List of PIL Image objects.
+            output_path: Path where the new PDF should be saved.
+        """
+        if not images:
+            print("No images to save.")
+            return
+        # Convert PIL images to RGB if necessary and save
+        pdf_images = []
+        for img in images:
+            if img.mode == 'RGBA':
+                img = img.convert('RGB')
+            pdf_images.append(img)
+        if pdf_images:
+            pdf_images[0].save(
+                output_path,
+                save_all=True,
+                append_images=pdf_images[1:],
+                resolution=100.0,
+                quality=95,
+                optimize=True
+            )
+            print(f"PDF saved successfully at {output_path}")

src/translator.py ADDED Viewed

	@@ -0,0 +1,323 @@

+from deep_translator import GoogleTranslator
+import deepl
+from openai import OpenAI
+from typing import List, Union, Optional
+import base64
+import io
+import json
+from PIL import Image
+class TranslatorService:
+    def __init__(self, source: str = 'en', target: str = 'de', service_type: str = 'google', api_key: Optional[str] = None):
+        """
+        Initializes the Translator Service.
+        Args:
+            source: Source language code (default: 'en').
+            target: Target language code (default: 'de').
+            service_type: 'google', 'deepl', 'openai', or 'xai'.
+            api_key: API Key for DeepL, OpenAI or xAI.
+        """
+        self.service_type = service_type
+        self.api_key = api_key
+        self.target = target
+        self.source = source
+        self.usage = {'input_tokens': 0, 'output_tokens': 0}
+        if self.service_type == 'deepl':
+            print("Using DeepL Translator")
+            if not self.api_key:
+                raise ValueError("DeepL API Key is required for DeepL service.")
+            self.translator = deepl.Translator(self.api_key)
+        elif self.service_type == 'openai':
+            print("Using OpenAI (GPT-4o-mini) Translator")
+            if not self.api_key:
+                raise ValueError("OpenAI API Key is required for OpenAI service.")
+            self.client = OpenAI(api_key=self.api_key)
+        elif self.service_type == 'xai':
+            print("Using xAI Grok Translator")
+            if not self.api_key:
+                raise ValueError("xAI API Key is required for Grok service.")
+            # xAI API is OpenAI-compatible
+            self.client = OpenAI(api_key=self.api_key, base_url="https://api.x.ai/v1")
+        else:
+            print("Using Google Translator (deep-translator)")
+            self.translator = GoogleTranslator(source=source, target=target)
+    def get_usage_stats(self):
+        """Returns accumulated token usage."""
+        return self.usage
+    def get_cost_estimate(self):
+        """
+        Returns estimated cost in USD based on GPT-4o-mini pricing.
+        Input: $0.15 / 1M tokens
+        Output: $0.60 / 1M tokens
+        """
+        input_cost = (self.usage['input_tokens'] / 1_000_000) * 0.15
+        output_cost = (self.usage['output_tokens'] / 1_000_000) * 0.60
+        return input_cost + output_cost
+    def validate_api_key(self) -> None:
+        """Performs a lightweight test call to validate the configured API key.
+        Raises:
+            Exception: If the key is invalid or the provider returns an auth error.
+        """
+        # Google (deep-translator) does not use an API key
+        if self.service_type not in ['deepl', 'openai', 'xai']:
+            return
+        if self.service_type == 'deepl':
+            # Minimal ping using the official client
+            try:
+                # This will raise an exception on invalid auth
+                _ = self.translator.get_usage()
+            except Exception as e:
+                raise Exception(f"DeepL API key seems invalid or not authorized: {e}")
+            return
+        # OpenAI / xAI
+        try:
+            model = "gpt-4o-mini" if self.service_type == 'openai' else "grok-4-mini"
+            # Very small test prompt to minimize cost
+            response = self.client.chat.completions.create(
+                model=model,
+                messages=[
+                    {"role": "user", "content": "test"}
+                ],
+                max_tokens=1,
+                temperature=0.0,
+            )
+            # If we get here without exception, we assume the key works.
+            if response.usage:
+                self.usage['input_tokens'] += response.usage.prompt_tokens
+                self.usage['output_tokens'] += response.usage.completion_tokens
+        except Exception as e:
+            raise Exception(f"{self.service_type.capitalize()} API key seems invalid or the service is not reachable: {e}")
+    def translate_image_with_vision(self, image: Image.Image) -> List[dict]:
+        """
+        Uses VLM (Vision Language Model) to detect and translate text directly from image.
+        Returns list of dicts: {'bbox': [x1, y1, x2, y2], 'original': str, 'translated': str}
+        """
+        if self.service_type not in ['openai', 'xai']:
+             raise ValueError("Vision features only supported for OpenAI and xAI services.")
+        # 1. Letterbox the image to be square (helps with coordinate accuracy)
+        old_width, old_height = image.size
+        new_size = max(old_width, old_height)
+        square_img = Image.new("RGB", (new_size, new_size), (255, 255, 255))
+        # Paste original image centered or top-left? Top-left is easier for coord math.
+        square_img.paste(image, (0, 0))
+        # Convert to base64
+        buffered = io.BytesIO()
+        square_img.save(buffered, format="JPEG")
+        img_str = base64.b64encode(buffered.getvalue()).decode("utf-8")
+        img_url = f"data:image/jpeg;base64,{img_str}"
+        model = "gpt-4o-mini" if self.service_type == 'openai' else "grok-4-latest"
+        prompt = f"""
+        You are a Manga Translator Agent.
+        Look at this manga page. Identify all speech bubbles and text boxes.
+        For each text region:
+        1. Extract the English text.
+        2. Translate it to German.
+        3. Estimate the bounding box as [ymin, xmin, ymax, xmax] using a 0-1000 normalized scale based on this square image.
+           - (0,0) is top-left corner.
+           - (1000,1000) is bottom-right corner.
+           - Be extremely precise with the coordinates.
+           - The image might have white padding on the right or bottom, ignore that area.
+        Return ONLY a valid JSON array with this structure:
+        [
+            {{
+                "original": "English text",
+                "translated": "German translation",
+                "bbox": [ymin, xmin, ymax, xmax]
+            }}
+        ]
+        Do not use markdown code blocks. Return raw JSON only.
+        """
+        try:
+            response = self.client.chat.completions.create(
+                model=model,
+                messages=[
+                    {
+                        "role": "user",
+                        "content": [
+                            {"type": "text", "text": prompt},
+                            {
+                                "type": "image_url",
+                                "image_url": {"url": img_url}
+                            }
+                        ],
+                    }
+                ],
+                max_tokens=2000,
+                temperature=0.1
+            )
+            # Track usage
+            if response.usage:
+                self.usage['input_tokens'] += response.usage.prompt_tokens
+                self.usage['output_tokens'] += response.usage.completion_tokens
+            content = response.choices[0].message.content.strip()
+            # Cleanup markdown if present
+            if content.startswith("```json"):
+                content = content[7:]
+            if content.endswith("```"):
+                content = content[:-3]
+            data = json.loads(content.strip())
+            results = []
+            for item in data:
+                ymin, xmin, ymax, xmax = item['bbox']
+                # Clamp values 0-1000
+                ymin = max(0, min(1000, ymin))
+                xmin = max(0, min(1000, xmin))
+                ymax = max(0, min(1000, ymax))
+                xmax = max(0, min(1000, xmax))
+                # Convert from 0-1000 scale relative to the SQUARE image
+                abs_x_min = int((xmin / 1000) * new_size)
+                abs_y_min = int((ymin / 1000) * new_size)
+                abs_x_max = int((xmax / 1000) * new_size)
+                abs_y_max = int((ymax / 1000) * new_size)
+                # Clip to original image dimensions (remove padding area results)
+                abs_x_min = min(abs_x_min, old_width)
+                abs_y_min = min(abs_y_min, old_height)
+                abs_x_max = min(abs_x_max, old_width)
+                abs_y_max = min(abs_y_max, old_height)
+                # Ensure valid box
+                if abs_x_max > abs_x_min and abs_y_max > abs_y_min:
+                    bbox_points = [
+                        [abs_x_min, abs_y_min], # Top-Left
+                        [abs_x_max, abs_y_min], # Top-Right
+                        [abs_x_max, abs_y_max], # Bottom-Right
+                        [abs_x_min, abs_y_max]  # Bottom-Left
+                    ]
+                    results.append({
+                        'bbox': bbox_points,
+                        'original': item.get('original', ''),
+                        'translated': item.get('translated', '')
+                    })
+            return results
+        except Exception as e:
+            print(f"Vision translation error: {e}")
+            return []
+    def translate_text(self, text: str) -> str:
+        """
+        Translates a single string.
+        """
+        if not text.strip():
+            return ""
+        try:
+            if self.service_type == 'deepl':
+                # DeepL uses slightly different language codes (e.g. 'DE' instead of 'de' usually, but 'de' works)
+                result = self.translator.translate_text(text, source_lang=None, target_lang=self.target)
+                return result.text
+            elif self.service_type in ['openai', 'xai']:
+                # Select model based on service
+                model = "gpt-4o-mini" if self.service_type == 'openai' else "grok-4-latest"
+                response = self.client.chat.completions.create(
+                    model=model,
+                    messages=[
+                        {"role": "system", "content": f"You are a professional manga translator. Translate the following text from {self.source} to {self.target}. Keep the translation natural and fitting for a comic/manga context. Ensure correct handling of German special characters like ä, ö, ü, ß. Only return the translated text, nothing else."},
+                        {"role": "user", "content": text}
+                    ],
+                    temperature=0.3
+                )
+                # Track usage
+                if response.usage:
+                    self.usage['input_tokens'] += response.usage.prompt_tokens
+                    self.usage['output_tokens'] += response.usage.completion_tokens
+                return response.choices[0].message.content.strip()
+            else:
+                return self.translator.translate(text)
+        except Exception as e:
+            print(f"Translation error: {e}")
+            return text
+    def translate_batch(self, texts: List[str]) -> List[str]:
+        """
+        Translates a list of strings.
+        """
+        if not texts:
+            return []
+        try:
+            if self.service_type == 'deepl':
+                results = self.translator.translate_text(texts, source_lang=None, target_lang=self.target)
+                return [r.text for r in results]
+            elif self.service_type in ['openai', 'xai']:
+                # Select model based on service
+                model = "gpt-4o-mini" if self.service_type == 'openai' else "grok-4-latest"
+                # OpenAI/xAI batch approach
+                formatted_text = "\n".join([f"{i+1}. {t}" for i, t in enumerate(texts)])
+                prompt = f"Translate the following numbered lines from {self.source} to {self.target}. Return them as a numbered list with the same indices.\n\n{formatted_text}"
+                response = self.client.chat.completions.create(
+                    model=model,
+                    messages=[
+                         {"role": "system", "content": f"You are a professional manga translator. Translate the text from {self.source} to {self.target}. Return ONLY the numbered list of translations."},
+                         {"role": "user", "content": prompt}
+                    ],
+                    temperature=0.3
+                )
+                # Track usage
+                if response.usage:
+                    self.usage['input_tokens'] += response.usage.prompt_tokens
+                    self.usage['output_tokens'] += response.usage.completion_tokens
+                content = response.choices[0].message.content.strip()
+                # Parse results back to list
+                translated_lines = []
+                # Simple parsing (robustness could be improved)
+                for line in content.split('\n'):
+                    if '. ' in line:
+                        parts = line.split('. ', 1)
+                        if len(parts) > 1:
+                            translated_lines.append(parts[1])
+                        else:
+                             translated_lines.append(line)
+                    else:
+                         translated_lines.append(line)
+                # Fallback if counts don't match (rare but possible)
+                if len(translated_lines) != len(texts):
+                     return [self.translate_text(t) for t in texts]
+                return translated_lines
+            else:
+                return self.translator.translate_batch(texts)
+        except Exception as e:
+            print(f"Batch translation error: {e}")
+            return texts

src/ui_state.py ADDED Viewed

	@@ -0,0 +1,21 @@

+"""Utilities for managing UI states across the Streamlit app."""
+from typing import Any
+def should_display_thumbnails(translation_in_progress: bool) -> bool:
+    """Determine whether page thumbnails should be rendered.
+    Args:
+        translation_in_progress (bool): Flag indicating if the translation workflow is active.
+    Returns:
+        bool: True if thumbnails should be shown, otherwise False.
+    Raises:
+        ValueError: If the provided flag is not a boolean.
+    """
+    if not isinstance(translation_in_progress, bool):
+        raise ValueError("translation_in_progress must be a boolean")
+    return not translation_in_progress

tests/test_ui_state.py ADDED Viewed

	@@ -0,0 +1,19 @@

+import pytest
+from src.ui_state import should_display_thumbnails
+def test_should_display_thumbnails_returns_true_when_not_in_progress():
+    """Thumbnails render while translation is idle."""
+    assert should_display_thumbnails(False) is True
+def test_should_display_thumbnails_returns_false_during_progress():
+    """Thumbnails stay hidden once translation starts."""
+    assert should_display_thumbnails(True) is False
+def test_should_display_thumbnails_rejects_non_boolean_input():
+    """Helper enforces boolean input to catch state bugs early."""
+    with pytest.raises(ValueError):
+        should_display_thumbnails("yes")