Spaces:

Marek4321
/

BabelSlide_2.0

Sleeping

App Files Files Community

Marek4321 commited on Aug 30, 2025

Commit

1df1e0b

verified ·

1 Parent(s): 962b900

Upload 14 files

Browse files

Files changed (14) hide show

README.md +76 -0
app.py +416 -0
core/base_processor.py +113 -0
core/base_translator.py +93 -0
core/exceptions.py +27 -0
processors/docx_processor.py +137 -0
processors/pdf_processor.py +179 -0
processors/pptx_processor.py +63 -0
requirements.txt +16 -0
translators/chatgpt_translator.py +55 -0
translators/deepseek_translator.py +58 -0
utils/constants.py +66 -0
utils/logger.py +105 -0
utils/validator.py +174 -0

README.md ADDED Viewed

	@@ -0,0 +1,76 @@

+---
+title: BabelSlide v2.0
+emoji: 🌍
+colorFrom: blue
+colorTo: purple
+sdk: streamlit
+sdk_version: 1.46.1
+app_file: app.py
+pinned: false
+license: mit
+---
+# 🌍 BabelSlide v2.0
+Professional document translation application powered by AI. Supports PDF, DOCX, and PPTX formats with intelligent translation using ChatGPT and DeepSeek.
+## ✨ Features
+- **Multi-format Support**: PDF, Microsoft Word (.docx), PowerPoint (.pptx)
+- **AI-Powered Translation**: ChatGPT (GPT-4) and DeepSeek integration
+- **Intelligent Processing**: Preserves document structure and formatting
+- **Clean Output**: Advanced post-processing removes unwanted LLM commentary
+- **Professional UI**: Modern, responsive Streamlit interface
+- **Comprehensive Logging**: Detailed process tracking and error handling
+- **Translation Reviews**: Automatic quality assessment generation
+## 🚀 How to Use
+1. **Configure API**: Select your AI provider (ChatGPT/DeepSeek) in the sidebar
+2. **Enter API Key**: Provide your API key (required for translation)
+3. **Set Languages**: Choose source and target languages
+4. **Upload Document**: Drag & drop or select your document (PDF/DOCX/PPTX)
+5. **Translate**: Click "🚀 Translate Document" and wait for processing
+6. **Download**: Get your translated document and quality review
+## 🔑 API Keys
+### ChatGPT (OpenAI)
+- Get your API key from [OpenAI Platform](https://platform.openai.com/api-keys)
+- Format: `sk-...`
+### DeepSeek
+- Get your API key from [DeepSeek Platform](https://platform.deepseek.com/)
+- More cost-effective alternative to ChatGPT
+## 📋 Supported Languages
+Arabic, Chinese (Simplified/Traditional), Dutch, English, French, German, Greek, Hindi, Indonesian, Italian, Japanese, Korean, Polish, Portuguese, Russian, Spanish, Swedish, Thai, Turkish, Vietnamese
+## 🛡️ Privacy & Security
+- **API keys are never stored** - they remain only in your browser session
+- **Documents are processed temporarily** - no permanent storage
+- **Secure processing** - all data is handled with privacy in mind
+## 🚨 Limitations
+- **File Size**: Maximum 50MB per document
+- **API Dependencies**: Requires active internet connection and valid API keys
+- **PDF Formatting**: Complex layouts may require manual adjustment after translation
+## 🏗️ Technical Architecture
+- **SOLID Principles**: Clean, modular, maintainable code
+- **Abstract Base Classes**: Extensible translator and processor interfaces
+- **Comprehensive Error Handling**: Graceful failure management
+- **Advanced AI Prompting**: Minimizes hallucinations and unwanted commentary
+- **Format Preservation**: Maintains document structure and styling
+## 📄 License
+This project is licensed under the MIT License.
+---
+**BabelSlide v2.0** - Breaking language barriers, one document at a time 🌍

app.py ADDED Viewed

	@@ -0,0 +1,416 @@

+#!/usr/bin/env python3
+"""
+BabelSlide v2.0 - Professional Document Translator
+Streamlit application for translating PDF, DOCX, and PPTX documents using AI
+"""
+import streamlit as st
+import tempfile
+from pathlib import Path
+import sys
+import os
+from translators.chatgpt_translator import ChatGPTTranslator
+from translators.deepseek_translator import DeepSeekTranslator
+from processors.pdf_processor import PDFProcessor
+from processors.docx_processor import DOCXProcessor
+from processors.pptx_processor import PPTXProcessor
+from utils.constants import LANGUAGES, API_PROVIDERS
+from utils.validator import FileValidator
+from utils.logger import setup_logger, ProcessLogger
+from core.exceptions import (
+    BabelSlideException,
+    ValidationError,
+    UnsupportedFileError,
+    FileSizeError,
+    APIKeyError,
+    TranslationError,
+    ProcessorError
+)
+class BabelSlideStreamlitApp:
+    """Streamlit interface for BabelSlide application"""
+    def __init__(self):
+        self.logger = setup_logger("BabelSlideUI")
+        self.process_logger = ProcessLogger(self.logger)
+        # Initialize session state
+        if 'processing' not in st.session_state:
+            st.session_state.processing = False
+        if 'translation_result' not in st.session_state:
+            st.session_state.translation_result = None
+        if 'review_result' not in st.session_state:
+            st.session_state.review_result = None
+    def setup_page_config(self):
+        """Configure Streamlit page"""
+        st.set_page_config(
+            page_title="BabelSlide - Document Translator",
+            page_icon="🌍",
+            layout="wide",
+            initial_sidebar_state="expanded"
+        )
+        # Custom CSS
+        st.markdown("""
+        <style>
+        .main-header {
+            text-align: center;
+            padding: 2rem 0;
+            background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
+            color: white;
+            border-radius: 12px;
+            margin-bottom: 2rem;
+        }
+        .success-box {
+            background: #d1fae5;
+            border: 1px solid #10b981;
+            border-radius: 8px;
+            padding: 1rem;
+            margin: 1rem 0;
+        }
+        .error-box {
+            background: #fef2f2;
+            border: 1px solid #ef4444;
+            border-radius: 8px;
+            padding: 1rem;
+            margin: 1rem 0;
+        }
+        .info-box {
+            background: #eff6ff;
+            border: 1px solid #3b82f6;
+            border-radius: 8px;
+            padding: 1rem;
+            margin: 1rem 0;
+        }
+        .stButton > button {
+            background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
+            color: white;
+            border: none;
+            border-radius: 8px;
+            padding: 0.5rem 2rem;
+            font-weight: 600;
+        }
+        </style>
+        """, unsafe_allow_html=True)
+    def render_header(self):
+        """Render application header"""
+        st.markdown("""
+        <div class="main-header">
+            <h1>🌍 BabelSlide v2.0</h1>
+            <p>Professional Document Translation using AI • PDF • DOCX • PPTX</p>
+        </div>
+        """, unsafe_allow_html=True)
+    def render_sidebar(self):
+        """Render configuration sidebar"""
+        st.sidebar.markdown("## ⚙️ Configuration")
+        # API Provider
+        api_provider = st.sidebar.selectbox(
+            "AI Provider",
+            options=list(API_PROVIDERS.keys()),
+            index=0,
+            help="Choose your preferred translation AI"
+        )
+        # API Key
+        api_key = st.sidebar.text_input(
+            "API Key",
+            type="password",
+            placeholder="Enter your API key (sk-... for OpenAI)",
+            help="Your API key is never stored permanently"
+        )
+        st.sidebar.markdown("---")
+        # Languages
+        col1, col2 = st.sidebar.columns(2)
+        with col1:
+            source_lang = st.selectbox(
+                "Source Language",
+                options=list(LANGUAGES.keys()),
+                index=list(LANGUAGES.keys()).index("English"),
+                help="Language of the original document"
+            )
+        with col2:
+            target_lang = st.selectbox(
+                "Target Language",
+                options=list(LANGUAGES.keys()),
+                index=list(LANGUAGES.keys()).index("Polish"),
+                help="Language to translate to"
+            )
+        st.sidebar.markdown("---")
+        st.sidebar.markdown("### 📝 Supported Formats")
+        st.sidebar.info("• PDF documents\n• DOCX (Word) files\n• PPTX (PowerPoint) presentations")
+        st.sidebar.warning("Maximum file size: 50 MB")
+        return api_provider, api_key, source_lang, target_lang
+    def render_file_upload(self):
+        """Render file upload section"""
+        st.markdown("## 📄 Document Upload")
+        uploaded_file = st.file_uploader(
+            "Choose a document to translate",
+            type=['pdf', 'docx', 'pptx'],
+            help="Upload PDF, DOCX, or PPTX files (max 50 MB)",
+            accept_multiple_files=False
+        )
+        if uploaded_file:
+            col1, col2, col3 = st.columns([2, 1, 1])
+            with col1:
+                st.info(f"📁 **File:** {uploaded_file.name}")
+            with col2:
+                file_size = len(uploaded_file.getvalue()) / (1024 * 1024)
+                st.info(f"📏 **Size:** {file_size:.1f} MB")
+            with col3:
+                file_type = uploaded_file.name.split('.')[-1].upper()
+                st.info(f"📋 **Type:** {file_type}")
+        return uploaded_file
+    def validate_inputs(self, file, api_provider, api_key, source_lang, target_lang):
+        """Validate all inputs before processing"""
+        errors = []
+        if not file:
+            errors.append("Please upload a document")
+        if not api_key or not api_key.strip():
+            errors.append("Please provide an API key")
+        if source_lang == target_lang:
+            errors.append("Source and target languages must be different")
+        # Validate file if provided
+        if file:
+            try:
+                # Create temporary file for validation
+                with tempfile.NamedTemporaryFile(delete=False, suffix=f".{file.name.split('.')[-1]}") as tmp_file:
+                    tmp_file.write(file.getvalue())
+                    tmp_file_path = Path(tmp_file.name)
+                FileValidator.validate_file(tmp_file_path)
+                tmp_file_path.unlink()  # Clean up
+            except (ValidationError, UnsupportedFileError, FileSizeError) as e:
+                errors.append(f"File validation error: {str(e)}")
+        # Validate API key format
+        try:
+            if api_key:
+                FileValidator.validate_api_key(api_key.strip(), api_provider)
+        except ValidationError as e:
+            errors.append(f"API key error: {str(e)}")
+        return errors
+    def process_document(self, file, api_provider, api_key, source_lang, target_lang):
+        """Process document translation"""
+        try:
+            # Create temporary file
+            with tempfile.NamedTemporaryFile(delete=False, suffix=f".{file.name.split('.')[-1]}") as tmp_file:
+                tmp_file.write(file.getvalue())
+                tmp_file_path = Path(tmp_file.name)
+            # Create translator
+            if api_provider == "ChatGPT":
+                translator = ChatGPTTranslator(api_key.strip())
+            elif api_provider == "DeepSeek":
+                translator = DeepSeekTranslator(api_key.strip())
+            else:
+                raise ValueError(f"Unsupported provider: {api_provider}")
+            # Create processor based on file extension
+            extension = tmp_file_path.suffix.lower()
+            if extension == '.pdf':
+                processor = PDFProcessor(translator)
+            elif extension == '.docx':
+                processor = DOCXProcessor(translator)
+            elif extension == '.pptx':
+                processor = PPTXProcessor(translator)
+            else:
+                raise ValueError(f"Unsupported file format: {extension}")
+            # Progress tracking
+            progress_bar = st.progress(0)
+            status_text = st.empty()
+            def progress_callback(progress_val, desc):
+                progress_bar.progress(progress_val)
+                status_text.text(desc)
+            # Process document
+            status_text.text("Starting translation...")
+            output_path, summary_text = processor.process_document(
+                tmp_file_path,
+                source_lang,
+                target_lang,
+                progress_callback
+            )
+            # Generate review
+            status_text.text("Generating review...")
+            review_text = self.generate_review(summary_text, source_lang, translator)
+            # Clean up temp file
+            tmp_file_path.unlink()
+            progress_bar.progress(1.0)
+            status_text.text("✅ Translation completed!")
+            return output_path, review_text, summary_text
+        except Exception as e:
+            self.logger.error(f"Translation error: {str(e)}")
+            raise
+    def generate_review(self, translated_text: str, source_lang: str, translator) -> str:
+        """Generate translation review"""
+        try:
+            system_prompt = f"""Generate a comprehensive translation review in {source_lang} covering:
+            1. Translation quality assessment
+            2. Coherence and consistency
+            3. Technical terminology accuracy
+            4. Overall readability
+            5. Recommendations for improvement
+            Keep the review concise but informative."""
+            # Use translator's API to generate review
+            review = translator._make_translation_request(
+                f"Review this translated text:\n\n{translated_text[:2000]}...",
+                "English",
+                source_lang
+            )
+            return translator._clean_translation_output(review)
+        except Exception as e:
+            return f"Review generation failed: {str(e)}"
+    def render_results(self):
+        """Render translation results"""
+        if st.session_state.translation_result:
+            st.markdown("## 📥 Results")
+            col1, col2 = st.columns(2)
+            with col1:
+                st.markdown("### 📄 Translated Document")
+                if st.session_state.translation_result:
+                    with open(st.session_state.translation_result, 'rb') as file:
+                        st.download_button(
+                            label="⬇️ Download Translated Document",
+                            data=file.read(),
+                            file_name=Path(st.session_state.translation_result).name,
+                            mime="application/octet-stream"
+                        )
+            with col2:
+                st.markdown("### 📋 Translation Review")
+                if st.session_state.review_result:
+                    st.download_button(
+                        label="⬇️ Download Review",
+                        data=st.session_state.review_result,
+                        file_name="translation_review.txt",
+                        mime="text/plain"
+                    )
+            # Summary
+            if hasattr(st.session_state, 'summary_text') and st.session_state.summary_text:
+                st.markdown("### 📝 Translation Summary")
+                with st.expander("View Summary", expanded=False):
+                    st.text_area(
+                        "Summary",
+                        value=st.session_state.summary_text[:1000] + "..." if len(st.session_state.summary_text) > 1000 else st.session_state.summary_text,
+                        height=200,
+                        disabled=True,
+                        label_visibility="collapsed"
+                    )
+    def run(self):
+        """Main application loop"""
+        self.setup_page_config()
+        self.render_header()
+        # Sidebar configuration
+        api_provider, api_key, source_lang, target_lang = self.render_sidebar()
+        # Main content
+        uploaded_file = self.render_file_upload()
+        # Translation button
+        st.markdown("---")
+        col1, col2, col3 = st.columns([1, 2, 1])
+        with col2:
+            translate_button = st.button(
+                "🚀 Translate Document",
+                disabled=st.session_state.processing,
+                use_container_width=True
+            )
+        # Process translation
+        if translate_button:
+            # Validate inputs
+            errors = self.validate_inputs(uploaded_file, api_provider, api_key, source_lang, target_lang)
+            if errors:
+                st.error("❌ **Please fix the following errors:**")
+                for error in errors:
+                    st.error(f"• {error}")
+            else:
+                st.session_state.processing = True
+                try:
+                    with st.spinner("Translating document..."):
+                        output_path, review_text, summary_text = self.process_document(
+                            uploaded_file, api_provider, api_key, source_lang, target_lang
+                        )
+                    # Store results
+                    st.session_state.translation_result = output_path
+                    st.session_state.review_result = review_text
+                    st.session_state.summary_text = summary_text
+                    st.success(f"✅ **Translation completed successfully!**\n\n"
+                              f"📄 **File:** {uploaded_file.name}\n"
+                              f"🔄 **Translation:** {source_lang} → {target_lang}\n"
+                              f"🤖 **Provider:** {api_provider}")
+                    # Auto-refresh to show results
+                    st.rerun()
+                except Exception as e:
+                    st.error(f"❌ **Translation failed:** {str(e)}")
+                finally:
+                    st.session_state.processing = False
+        # Show results if available
+        self.render_results()
+        # Footer
+        st.markdown("---")
+        st.markdown(
+            "<div style='text-align: center; color: #666;'>"
+            "<strong>BabelSlide v2.0</strong> • Professional document translation • Built for global communication"
+            "</div>",
+            unsafe_allow_html=True
+        )
+# Main entry point
+if __name__ == "__main__":
+    app = BabelSlideStreamlitApp()
+    app.run()

core/base_processor.py ADDED Viewed

	@@ -0,0 +1,113 @@

+from abc import ABC, abstractmethod
+from typing import List, Tuple, Optional, Generator
+from pathlib import Path
+from core.base_translator import BaseTranslator
+from core.exceptions import ProcessorError
+import os
+class DocumentProcessor(ABC):
+    """Abstract base class for document processors"""
+    def __init__(self, translator: BaseTranslator):
+        self.translator = translator
+    @abstractmethod
+    def extract_text_elements(self, file_path: Path) -> Generator[Tuple[str, dict], None, None]:
+        """
+        Extract text elements from document
+        Args:
+            file_path: Path to the document
+        Yields:
+            Tuple of (text_content, metadata) for each translatable element
+        """
+        pass
+    @abstractmethod
+    def apply_translations(self, file_path: Path, translations: List[Tuple[str, dict]]) -> Path:
+        """
+        Apply translations back to the document
+        Args:
+            file_path: Path to original document
+            translations: List of (translated_text, metadata) tuples
+        Returns:
+            Path to the translated document
+        """
+        pass
+    def process_document(
+        self,
+        file_path: Path,
+        source_lang: str,
+        target_lang: str,
+        progress_callback: Optional[callable] = None
+    ) -> Tuple[Path, str]:
+        """
+        Process entire document translation
+        Args:
+            file_path: Path to document
+            source_lang: Source language
+            target_lang: Target language
+            progress_callback: Optional progress callback
+        Returns:
+            Tuple of (output_file_path, summary_text)
+        """
+        try:
+            # Extract text elements
+            text_elements = list(self.extract_text_elements(file_path))
+            total_elements = len(text_elements)
+            if total_elements == 0:
+                raise ProcessorError("No translatable text found in document")
+            # Translate each element
+            translations = []
+            all_translated_text = ""
+            for i, (text, metadata) in enumerate(text_elements):
+                if text.strip():  # Only translate non-empty text
+                    translated = self.translator.translate_text(text, source_lang, target_lang)
+                    translations.append((translated, metadata))
+                    all_translated_text += translated + "\n"
+                else:
+                    translations.append((text, metadata))  # Keep empty text as-is
+                # Update progress
+                if progress_callback:
+                    progress_callback((i + 1) / total_elements, f"Translating element {i + 1}/{total_elements}")
+            # Apply translations to document
+            output_path = self.apply_translations(file_path, translations)
+            return output_path, all_translated_text
+        except Exception as e:
+            raise ProcessorError(f"Document processing failed: {str(e)}")
+    def generate_output_path(self, original_path: Path, suffix: str = "translated") -> Path:
+        """
+        Generate output file path
+        Args:
+            original_path: Original file path
+            suffix: Suffix to add to filename
+        Returns:
+            New file path with suffix
+        """
+        stem = original_path.stem
+        extension = original_path.suffix
+        directory = original_path.parent
+        return directory / f"{stem}_{suffix}{extension}"
+    @property
+    @abstractmethod
+    def supported_extensions(self) -> List[str]:
+        """Return list of supported file extensions"""
+        pass

core/base_translator.py ADDED Viewed

	@@ -0,0 +1,93 @@

+from abc import ABC, abstractmethod
+from typing import Dict, Any
+import re
+from utils.constants import STRICT_TRANSLATION_PROMPT, UNWANTED_PATTERNS
+from core.exceptions import TranslationError
+class BaseTranslator(ABC):
+    """Abstract base class for all translators"""
+    def __init__(self, api_key: str):
+        self.api_key = api_key
+        self._validate_api_key()
+    @abstractmethod
+    def _validate_api_key(self) -> None:
+        """Validate API key format and accessibility"""
+        pass
+    @abstractmethod
+    def _make_translation_request(self, text: str, source_lang: str, target_lang: str) -> str:
+        """Make the actual API request for translation"""
+        pass
+    def translate_text(self, text: str, source_lang: str, target_lang: str) -> str:
+        """
+        Translate text with strict post-processing to remove LLM commentary
+        Args:
+            text: Text to translate
+            source_lang: Source language code
+            target_lang: Target language code
+        Returns:
+            Clean translated text without LLM commentary
+        """
+        if not text.strip():
+            return text
+        try:
+            # Get translation from API
+            translated = self._make_translation_request(text, source_lang, target_lang)
+            # Clean the response from unwanted LLM additions
+            cleaned = self._clean_translation_output(translated)
+            return cleaned
+        except Exception as e:
+            raise TranslationError(f"Translation failed: {str(e)}")
+    def _clean_translation_output(self, output: str) -> str:
+        """
+        Remove common LLM commentary and formatting artifacts
+        Args:
+            output: Raw output from LLM
+        Returns:
+            Cleaned translation text
+        """
+        cleaned = output.strip()
+        # Apply regex patterns to remove unwanted additions
+        for pattern in UNWANTED_PATTERNS:
+            cleaned = re.sub(pattern, '', cleaned, flags=re.IGNORECASE | re.MULTILINE)
+        # Remove excessive whitespace while preserving intentional formatting
+        cleaned = re.sub(r'\n{3,}', '\n\n', cleaned)  # Max 2 consecutive newlines
+        cleaned = re.sub(r'[ \t]+', ' ', cleaned)      # Normalize spaces
+        return cleaned.strip()
+    def get_system_prompt(self, source_lang: str, target_lang: str) -> str:
+        """
+        Get the strict system prompt for translation
+        Args:
+            source_lang: Source language name
+            target_lang: Target language name
+        Returns:
+            Formatted system prompt
+        """
+        return STRICT_TRANSLATION_PROMPT.format(
+            source_lang=source_lang,
+            target_lang=target_lang
+        )
+    @property
+    @abstractmethod
+    def provider_name(self) -> str:
+        """Return the name of the translation provider"""
+        pass

core/exceptions.py ADDED Viewed

	@@ -0,0 +1,27 @@

+class BabelSlideException(Exception):
+    """Base exception for BabelSlide application"""
+    pass
+class TranslationError(BabelSlideException):
+    """Raised when translation fails"""
+    pass
+class ProcessorError(BabelSlideException):
+    """Raised when document processing fails"""
+    pass
+class ValidationError(BabelSlideException):
+    """Raised when input validation fails"""
+    pass
+class APIKeyError(BabelSlideException):
+    """Raised when API key is invalid or missing"""
+    pass
+class UnsupportedFileError(BabelSlideException):
+    """Raised when file format is not supported"""
+    pass
+class FileSizeError(BabelSlideException):
+    """Raised when file is too large"""
+    pass

processors/docx_processor.py ADDED Viewed

	@@ -0,0 +1,137 @@

+from typing import List, Tuple, Generator
+from pathlib import Path
+from docx import Document
+from docx.shared import Inches
+from core.base_processor import DocumentProcessor
+from core.exceptions import ProcessorError
+class DOCXProcessor(DocumentProcessor):
+    """Microsoft Word document processor"""
+    def extract_text_elements(self, file_path: Path) -> Generator[Tuple[str, dict], None, None]:
+        """Extract text from Word document"""
+        try:
+            doc = Document(file_path)
+            # Extract text from paragraphs
+            for para_idx, paragraph in enumerate(doc.paragraphs):
+                if paragraph.text.strip():
+                    metadata = {
+                        'element_type': 'paragraph',
+                        'index': para_idx,
+                        'style': paragraph.style.name if paragraph.style else 'Normal',
+                        'original_text': paragraph.text
+                    }
+                    yield paragraph.text, metadata
+            # Extract text from tables
+            for table_idx, table in enumerate(doc.tables):
+                for row_idx, row in enumerate(table.rows):
+                    for cell_idx, cell in enumerate(row.cells):
+                        if cell.text.strip():
+                            metadata = {
+                                'element_type': 'table_cell',
+                                'table_index': table_idx,
+                                'row_index': row_idx,
+                                'cell_index': cell_idx,
+                                'original_text': cell.text
+                            }
+                            yield cell.text, metadata
+            # Extract text from headers and footers
+            for section_idx, section in enumerate(doc.sections):
+                # Header
+                if section.header:
+                    for para_idx, paragraph in enumerate(section.header.paragraphs):
+                        if paragraph.text.strip():
+                            metadata = {
+                                'element_type': 'header',
+                                'section_index': section_idx,
+                                'paragraph_index': para_idx,
+                                'original_text': paragraph.text
+                            }
+                            yield paragraph.text, metadata
+                # Footer
+                if section.footer:
+                    for para_idx, paragraph in enumerate(section.footer.paragraphs):
+                        if paragraph.text.strip():
+                            metadata = {
+                                'element_type': 'footer',
+                                'section_index': section_idx,
+                                'paragraph_index': para_idx,
+                                'original_text': paragraph.text
+                            }
+                            yield paragraph.text, metadata
+        except Exception as e:
+            raise ProcessorError(f"Failed to extract text from Word document: {str(e)}")
+    def apply_translations(self, file_path: Path, translations: List[Tuple[str, dict]]) -> Path:
+        """Apply translations to Word document"""
+        try:
+            # Load the original document
+            doc = Document(file_path)
+            # Group translations by type
+            paragraph_translations = {}
+            table_translations = {}
+            header_translations = {}
+            footer_translations = {}
+            for translated_text, metadata in translations:
+                element_type = metadata['element_type']
+                if element_type == 'paragraph':
+                    paragraph_translations[metadata['index']] = translated_text
+                elif element_type == 'table_cell':
+                    table_key = (metadata['table_index'], metadata['row_index'], metadata['cell_index'])
+                    table_translations[table_key] = translated_text
+                elif element_type == 'header':
+                    header_key = (metadata['section_index'], metadata['paragraph_index'])
+                    header_translations[header_key] = translated_text
+                elif element_type == 'footer':
+                    footer_key = (metadata['section_index'], metadata['paragraph_index'])
+                    footer_translations[footer_key] = translated_text
+            # Apply paragraph translations
+            for para_idx, paragraph in enumerate(doc.paragraphs):
+                if para_idx in paragraph_translations:
+                    paragraph.text = paragraph_translations[para_idx]
+            # Apply table translations
+            for table_idx, table in enumerate(doc.tables):
+                for row_idx, row in enumerate(table.rows):
+                    for cell_idx, cell in enumerate(row.cells):
+                        table_key = (table_idx, row_idx, cell_idx)
+                        if table_key in table_translations:
+                            cell.text = table_translations[table_key]
+            # Apply header and footer translations
+            for section_idx, section in enumerate(doc.sections):
+                # Headers
+                if section.header:
+                    for para_idx, paragraph in enumerate(section.header.paragraphs):
+                        header_key = (section_idx, para_idx)
+                        if header_key in header_translations:
+                            paragraph.text = header_translations[header_key]
+                # Footers
+                if section.footer:
+                    for para_idx, paragraph in enumerate(section.footer.paragraphs):
+                        footer_key = (section_idx, para_idx)
+                        if footer_key in footer_translations:
+                            paragraph.text = footer_translations[footer_key]
+            # Save translated document
+            output_path = self.generate_output_path(file_path, "translated")
+            doc.save(output_path)
+            return output_path
+        except Exception as e:
+            raise ProcessorError(f"Failed to apply translations to Word document: {str(e)}")
+    @property
+    def supported_extensions(self) -> List[str]:
+        return ['.docx']

processors/pdf_processor.py ADDED Viewed

	@@ -0,0 +1,179 @@

+from typing import List, Tuple, Generator
+from pathlib import Path
+import fitz  # PyMuPDF
+from reportlab.pdfgen import canvas
+from reportlab.lib.pagesizes import letter
+from reportlab.pdfbase import pdfutils
+from reportlab.pdfbase.ttfonts import TTFont
+from reportlab.pdfbase import pdfmetrics
+from core.base_processor import DocumentProcessor
+from core.exceptions import ProcessorError
+class PDFProcessor(DocumentProcessor):
+    """PDF document processor"""
+    def __init__(self, translator):
+        super().__init__(translator)
+        # Register default font that supports Unicode
+        try:
+            # Try to register a system font that supports multiple languages
+            pdfmetrics.registerFont(TTFont('DejaVuSans', 'DejaVuSans.ttf'))
+            self.font_name = 'DejaVuSans'
+        except:
+            # Fallback to Helvetica if DejaVu is not available
+            self.font_name = 'Helvetica'
+    def extract_text_elements(self, file_path: Path) -> Generator[Tuple[str, dict], None, None]:
+        """Extract text from PDF"""
+        try:
+            pdf_document = fitz.open(file_path)
+            for page_num in range(len(pdf_document)):
+                page = pdf_document[page_num]
+                text_blocks = page.get_text("dict")
+                for block_idx, block in enumerate(text_blocks["blocks"]):
+                    if "lines" in block:  # Text block
+                        block_text = ""
+                        for line in block["lines"]:
+                            for span in line["spans"]:
+                                block_text += span["text"]
+                            block_text += "\n"
+                        if block_text.strip():
+                            metadata = {
+                                'page_number': page_num,
+                                'block_index': block_idx,
+                                'bbox': block["bbox"],  # Bounding box for positioning
+                                'original_text': block_text.strip()
+                            }
+                            yield block_text.strip(), metadata
+            pdf_document.close()
+        except Exception as e:
+            raise ProcessorError(f"Failed to extract text from PDF: {str(e)}")
+    def apply_translations(self, file_path: Path, translations: List[Tuple[str, dict]]) -> Path:
+        """
+        Apply translations to PDF by creating a new document
+        Note: PDF translation is complex due to formatting preservation.
+        This creates a simplified translated version.
+        """
+        try:
+            # Create output path
+            output_path = self.generate_output_path(file_path, "translated")
+            # Group translations by page
+            page_translations = {}
+            for translated_text, metadata in translations:
+                page_num = metadata['page_number']
+                if page_num not in page_translations:
+                    page_translations[page_num] = []
+                page_translations[page_num].append({
+                    'text': translated_text,
+                    'bbox': metadata['bbox']
+                })
+            # Create new PDF with translations
+            pdf_canvas = canvas.Canvas(str(output_path), pagesize=letter)
+            # Get original PDF dimensions
+            original_pdf = fitz.open(file_path)
+            for page_num in range(len(original_pdf)):
+                if page_num > 0:
+                    pdf_canvas.showPage()
+                # Get page dimensions
+                page = original_pdf[page_num]
+                page_rect = page.rect
+                if page_num in page_translations:
+                    y_position = page_rect.height - 50  # Start from top
+                    for translation_block in page_translations[page_num]:
+                        text = translation_block['text']
+                        # Set font and size
+                        pdf_canvas.setFont(self.font_name, 12)
+                        # Handle multi-line text
+                        lines = text.split('\n')
+                        for line in lines:
+                            if line.strip():
+                                pdf_canvas.drawString(50, y_position, line.strip())
+                                y_position -= 15  # Line spacing
+                        y_position -= 10  # Block spacing
+            pdf_canvas.save()
+            original_pdf.close()
+            return output_path
+        except Exception as e:
+            raise ProcessorError(f"Failed to apply translations to PDF: {str(e)}")
+    def create_text_only_pdf(self, file_path: Path, translations: List[Tuple[str, dict]]) -> Path:
+        """
+        Create a simplified text-only PDF with translations
+        This is a fallback method for complex PDFs
+        """
+        try:
+            output_path = self.generate_output_path(file_path, "translated_text_only")
+            # Group by pages
+            page_translations = {}
+            for translated_text, metadata in translations:
+                page_num = metadata['page_number']
+                if page_num not in page_translations:
+                    page_translations[page_num] = []
+                page_translations[page_num].append(translated_text)
+            pdf_canvas = canvas.Canvas(str(output_path), pagesize=letter)
+            for page_num in sorted(page_translations.keys()):
+                if page_num > 0:
+                    pdf_canvas.showPage()
+                # Add page title
+                pdf_canvas.setFont(self.font_name, 14)
+                pdf_canvas.drawString(50, 750, f"Page {page_num + 1}")
+                y_position = 720
+                pdf_canvas.setFont(self.font_name, 11)
+                for text_block in page_translations[page_num]:
+                    lines = text_block.split('\n')
+                    for line in lines:
+                        if line.strip() and y_position > 50:
+                            # Handle long lines by wrapping
+                            if len(line) > 80:
+                                words = line.split()
+                                current_line = ""
+                                for word in words:
+                                    if len(current_line + word) < 80:
+                                        current_line += word + " "
+                                    else:
+                                        if current_line.strip():
+                                            pdf_canvas.drawString(50, y_position, current_line.strip())
+                                            y_position -= 12
+                                        current_line = word + " "
+                                if current_line.strip():
+                                    pdf_canvas.drawString(50, y_position, current_line.strip())
+                                    y_position -= 12
+                            else:
+                                pdf_canvas.drawString(50, y_position, line.strip())
+                                y_position -= 12
+                    y_position -= 8  # Block spacing
+            pdf_canvas.save()
+            return output_path
+        except Exception as e:
+            raise ProcessorError(f"Failed to create text-only PDF: {str(e)}")
+    @property
+    def supported_extensions(self) -> List[str]:
+        return ['.pdf']

processors/pptx_processor.py ADDED Viewed

	@@ -0,0 +1,63 @@

+from typing import List, Tuple, Generator
+from pathlib import Path
+from pptx import Presentation
+from core.base_processor import DocumentProcessor
+from core.exceptions import ProcessorError
+class PPTXProcessor(DocumentProcessor):
+    """PowerPoint presentation processor"""
+    def extract_text_elements(self, file_path: Path) -> Generator[Tuple[str, dict], None, None]:
+        """Extract text from PowerPoint slides"""
+        try:
+            prs = Presentation(file_path)
+            for slide_idx, slide in enumerate(prs.slides):
+                for shape_idx, shape in enumerate(slide.shapes):
+                    if hasattr(shape, "text") and shape.text.strip():
+                        metadata = {
+                            'slide_index': slide_idx,
+                            'shape_index': shape_idx,
+                            'shape_type': str(type(shape)),
+                            'original_text': shape.text
+                        }
+                        yield shape.text, metadata
+        except Exception as e:
+            raise ProcessorError(f"Failed to extract text from PowerPoint: {str(e)}")
+    def apply_translations(self, file_path: Path, translations: List[Tuple[str, dict]]) -> Path:
+        """Apply translations to PowerPoint presentation"""
+        try:
+            # Load the original presentation
+            prs = Presentation(file_path)
+            # Create a mapping of translations by slide and shape index
+            translation_map = {}
+            for translated_text, metadata in translations:
+                slide_idx = metadata['slide_index']
+                shape_idx = metadata['shape_index']
+                if slide_idx not in translation_map:
+                    translation_map[slide_idx] = {}
+                translation_map[slide_idx][shape_idx] = translated_text
+            # Apply translations
+            for slide_idx, slide in enumerate(prs.slides):
+                if slide_idx in translation_map:
+                    slide_translations = translation_map[slide_idx]
+                    for shape_idx, shape in enumerate(slide.shapes):
+                        if shape_idx in slide_translations and hasattr(shape, "text"):
+                            shape.text = slide_translations[shape_idx]
+            # Save translated presentation
+            output_path = self.generate_output_path(file_path, "translated")
+            prs.save(output_path)
+            return output_path
+        except Exception as e:
+            raise ProcessorError(f"Failed to apply translations to PowerPoint: {str(e)}")
+    @property
+    def supported_extensions(self) -> List[str]:
+        return ['.pptx']

requirements.txt ADDED Viewed

	@@ -0,0 +1,16 @@

+# Core dependencies
+streamlit==1.46.1
+openai>=1.0.0
+python-pptx>=0.6.21
+python-docx>=1.1.0
+PyMuPDF>=1.23.0  # fitz for PDF processing
+reportlab>=4.0.0  # PDF generation
+# Optional dependencies for better PDF support
+Pillow>=10.0.0  # Image processing
+fonttools>=4.0.0  # Font handling
+# Development dependencies (optional)
+# pytest>=7.0.0
+# black>=23.0.0
+# flake8>=6.0.0

translators/chatgpt_translator.py ADDED Viewed

	@@ -0,0 +1,55 @@

+from openai import OpenAI
+from core.base_translator import BaseTranslator
+from core.exceptions import APIKeyError, TranslationError
+from utils.constants import API_PROVIDERS, PROVIDER_MODELS
+class ChatGPTTranslator(BaseTranslator):
+    """ChatGPT/OpenAI translator implementation"""
+    def __init__(self, api_key: str):
+        super().__init__(api_key)
+        self.client = OpenAI(api_key=api_key)
+    def _validate_api_key(self) -> None:
+        """Validate OpenAI API key format"""
+        if not self.api_key or not self.api_key.startswith('sk-'):
+            raise APIKeyError("Invalid OpenAI API key format. Must start with 'sk-'")
+    def _make_translation_request(self, text: str, source_lang: str, target_lang: str) -> str:
+        """Make translation request to OpenAI API"""
+        try:
+            response = self.client.chat.completions.create(
+                model=PROVIDER_MODELS["ChatGPT"],
+                messages=[
+                    {
+                        "role": "system",
+                        "content": self.get_system_prompt(source_lang, target_lang)
+                    },
+                    {
+                        "role": "user",
+                        "content": text
+                    }
+                ],
+                temperature=0.1,  # Low temperature for consistent translations
+                max_tokens=4000,
+                top_p=0.9
+            )
+            if not response.choices or not response.choices[0].message:
+                raise TranslationError("Empty response from ChatGPT API")
+            return response.choices[0].message.content
+        except Exception as e:
+            if "invalid_api_key" in str(e).lower():
+                raise APIKeyError("Invalid ChatGPT API key")
+            elif "rate_limit" in str(e).lower():
+                raise TranslationError("Rate limit exceeded. Please try again later.")
+            elif "quota" in str(e).lower():
+                raise TranslationError("API quota exceeded. Please check your billing.")
+            else:
+                raise TranslationError(f"ChatGPT API error: {str(e)}")
+    @property
+    def provider_name(self) -> str:
+        return "ChatGPT"

translators/deepseek_translator.py ADDED Viewed

	@@ -0,0 +1,58 @@

+from openai import OpenAI
+from core.base_translator import BaseTranslator
+from core.exceptions import APIKeyError, TranslationError
+from utils.constants import API_PROVIDERS, PROVIDER_MODELS
+class DeepSeekTranslator(BaseTranslator):
+    """DeepSeek translator implementation"""
+    def __init__(self, api_key: str):
+        super().__init__(api_key)
+        self.client = OpenAI(
+            api_key=api_key,
+            base_url=API_PROVIDERS["DeepSeek"]
+        )
+    def _validate_api_key(self) -> None:
+        """Validate DeepSeek API key format"""
+        if not self.api_key or len(self.api_key) < 10:
+            raise APIKeyError("Invalid DeepSeek API key format")
+    def _make_translation_request(self, text: str, source_lang: str, target_lang: str) -> str:
+        """Make translation request to DeepSeek API"""
+        try:
+            response = self.client.chat.completions.create(
+                model=PROVIDER_MODELS["DeepSeek"],
+                messages=[
+                    {
+                        "role": "system",
+                        "content": self.get_system_prompt(source_lang, target_lang)
+                    },
+                    {
+                        "role": "user",
+                        "content": text
+                    }
+                ],
+                temperature=0.1,  # Low temperature for consistent translations
+                max_tokens=4000,
+                top_p=0.9
+            )
+            if not response.choices or not response.choices[0].message:
+                raise TranslationError("Empty response from DeepSeek API")
+            return response.choices[0].message.content
+        except Exception as e:
+            if "invalid_api_key" in str(e).lower() or "unauthorized" in str(e).lower():
+                raise APIKeyError("Invalid DeepSeek API key")
+            elif "rate_limit" in str(e).lower():
+                raise TranslationError("Rate limit exceeded. Please try again later.")
+            elif "quota" in str(e).lower():
+                raise TranslationError("API quota exceeded. Please check your billing.")
+            else:
+                raise TranslationError(f"DeepSeek API error: {str(e)}")
+    @property
+    def provider_name(self) -> str:
+        return "DeepSeek"

utils/constants.py ADDED Viewed

	@@ -0,0 +1,66 @@

+from typing import Dict
+# Supported languages
+LANGUAGES: Dict[str, str] = {
+    "Arabic": "ar",
+    "Chinese (Simplified)": "zh",
+    "Chinese (Traditional)": "zh-TW",
+    "Dutch": "nl",
+    "English": "en",
+    "French": "fr",
+    "German": "de",
+    "Greek": "el",
+    "Hindi": "hi",
+    "Indonesian": "id",
+    "Italian": "it",
+    "Japanese": "ja",
+    "Korean": "ko",
+    "Polish": "pl",
+    "Portuguese": "pt",
+    "Russian": "ru",
+    "Spanish": "es",
+    "Swedish": "sv",
+    "Thai": "th",
+    "Turkish": "tr",
+    "Vietnamese": "vi"
+}
+# API providers configuration
+API_PROVIDERS: Dict[str, str] = {
+    "ChatGPT": "https://api.openai.com/v1",
+    "DeepSeek": "https://api.deepseek.com"
+}
+# Models for each provider
+PROVIDER_MODELS: Dict[str, str] = {
+    "ChatGPT": "gpt-4o",
+    "DeepSeek": "deepseek-chat"
+}
+# Strict translation system prompt to prevent extra commentary
+STRICT_TRANSLATION_PROMPT = """You are a professional translator. Your task is to translate text accurately while maintaining the original formatting and structure.
+CRITICAL RULES:
+1. Return ONLY the translated text
+2. Do NOT add explanations, comments, or notes
+3. Do NOT add quotation marks around the translation
+4. Do NOT add phrases like "Here is the translation:" or "The translation is:"
+5. Preserve original formatting, line breaks, and punctuation
+6. If you cannot translate something, return it unchanged
+7. Do NOT be conversational - just translate
+Translate from {source_lang} to {target_lang}. Return only the translated text."""
+# Post-processing patterns to clean unwanted LLM additions
+UNWANTED_PATTERNS = [
+    r'^(Here is the translation|The translation is|Translation|Translated text):?\s*',
+    r'^"([^"]*)"$',  # Remove surrounding quotes
+    r'^\s*[-•]\s*',  # Remove bullet points
+    r'\n\n(Note|Comment|Explanation):.*$',  # Remove trailing notes
+]
+# File size limits (in MB)
+MAX_FILE_SIZE_MB = 50
+# Supported file extensions
+SUPPORTED_EXTENSIONS = ['.pptx', '.pdf', '.docx']

utils/logger.py ADDED Viewed

	@@ -0,0 +1,105 @@

+import logging
+import sys
+from datetime import datetime
+from pathlib import Path
+from typing import Optional
+class ColoredFormatter(logging.Formatter):
+    """Custom formatter with colors for console output"""
+    COLORS = {
+        'DEBUG': '\033[36m',    # Cyan
+        'INFO': '\033[32m',     # Green
+        'WARNING': '\033[33m',  # Yellow
+        'ERROR': '\033[31m',    # Red
+        'CRITICAL': '\033[35m', # Magenta
+        'ENDC': '\033[0m'       # End color
+    }
+    def format(self, record):
+        log_color = self.COLORS.get(record.levelname, self.COLORS['ENDC'])
+        record.levelname = f"{log_color}{record.levelname}{self.COLORS['ENDC']}"
+        return super().format(record)
+def setup_logger(name: str = "BabelSlide", level: int = logging.INFO, log_file: Optional[Path] = None) -> logging.Logger:
+    """
+    Setup logger with console and optional file output
+    Args:
+        name: Logger name
+        level: Logging level
+        log_file: Optional file path for logging
+    Returns:
+        Configured logger instance
+    """
+    logger = logging.getLogger(name)
+    logger.setLevel(level)
+    # Clear existing handlers
+    logger.handlers.clear()
+    # Console handler with colors
+    console_handler = logging.StreamHandler(sys.stdout)
+    console_handler.setLevel(level)
+    console_formatter = ColoredFormatter(
+        '%(asctime)s | %(levelname)s | %(name)s | %(message)s',
+        datefmt='%H:%M:%S'
+    )
+    console_handler.setFormatter(console_formatter)
+    logger.addHandler(console_handler)
+    # File handler if specified
+    if log_file:
+        log_file.parent.mkdir(parents=True, exist_ok=True)
+        file_handler = logging.FileHandler(log_file)
+        file_handler.setLevel(level)
+        file_formatter = logging.Formatter(
+            '%(asctime)s | %(levelname)s | %(name)s | %(message)s',
+            datefmt='%Y-%m-%d %H:%M:%S'
+        )
+        file_handler.setFormatter(file_formatter)
+        logger.addHandler(file_handler)
+    return logger
+class ProcessLogger:
+    """Logger for tracking document processing progress"""
+    def __init__(self, logger: logging.Logger):
+        self.logger = logger
+        self.start_time = None
+        self.current_step = None
+        self.total_steps = None
+    def start_process(self, total_steps: int, process_name: str = "Processing"):
+        """Start a new process with progress tracking"""
+        self.start_time = datetime.now()
+        self.total_steps = total_steps
+        self.current_step = 0
+        self.logger.info(f"Started {process_name} - Total steps: {total_steps}")
+    def log_step(self, step_name: str, step_number: Optional[int] = None):
+        """Log completion of a processing step"""
+        if step_number is not None:
+            self.current_step = step_number
+        else:
+            self.current_step += 1
+        if self.total_steps:
+            progress = (self.current_step / self.total_steps) * 100
+            self.logger.info(f"Step {self.current_step}/{self.total_steps} ({progress:.1f}%): {step_name}")
+        else:
+            self.logger.info(f"Step {self.current_step}: {step_name}")
+    def finish_process(self, success: bool = True):
+        """Mark process as finished"""
+        if self.start_time:
+            duration = datetime.now() - self.start_time
+            status = "completed successfully" if success else "failed"
+            self.logger.info(f"Process {status} in {duration.total_seconds():.2f} seconds")
+        # Reset state
+        self.start_time = None
+        self.current_step = None
+        self.total_steps = None

utils/validator.py ADDED Viewed

	@@ -0,0 +1,174 @@

+import os
+from pathlib import Path
+from typing import List, Optional
+from core.exceptions import ValidationError, UnsupportedFileError, FileSizeError
+from utils.constants import MAX_FILE_SIZE_MB, SUPPORTED_EXTENSIONS, LANGUAGES
+class FileValidator:
+    """Validator for file uploads and processing parameters"""
+    @staticmethod
+    def validate_file(file_path: Path) -> None:
+        """
+        Validate uploaded file
+        Args:
+            file_path: Path to the file to validate
+        Raises:
+            ValidationError: If file is invalid
+            UnsupportedFileError: If file format is not supported
+            FileSizeError: If file is too large
+        """
+        if not file_path.exists():
+            raise ValidationError(f"File does not exist: {file_path}")
+        if not file_path.is_file():
+            raise ValidationError(f"Path is not a file: {file_path}")
+        # Check file extension
+        extension = file_path.suffix.lower()
+        if extension not in SUPPORTED_EXTENSIONS:
+            raise UnsupportedFileError(
+                f"Unsupported file format: {extension}. "
+                f"Supported formats: {', '.join(SUPPORTED_EXTENSIONS)}"
+            )
+        # Check file size
+        file_size_mb = file_path.stat().st_size / (1024 * 1024)
+        if file_size_mb > MAX_FILE_SIZE_MB:
+            raise FileSizeError(
+                f"File too large: {file_size_mb:.1f}MB. "
+                f"Maximum allowed size: {MAX_FILE_SIZE_MB}MB"
+            )
+        # Check if file is readable
+        try:
+            with open(file_path, 'rb') as f:
+                f.read(1024)  # Try to read first KB
+        except Exception as e:
+            raise ValidationError(f"Cannot read file: {str(e)}")
+    @staticmethod
+    def validate_language(language: str) -> str:
+        """
+        Validate and normalize language input
+        Args:
+            language: Language name or code
+        Returns:
+            Normalized language name
+        Raises:
+            ValidationError: If language is not supported
+        """
+        if not language:
+            raise ValidationError("Language cannot be empty")
+        # Check if it's a valid language name
+        if language in LANGUAGES:
+            return language
+        # Check if it's a valid language code
+        for name, code in LANGUAGES.items():
+            if code == language:
+                return name
+        raise ValidationError(
+            f"Unsupported language: {language}. "
+            f"Supported languages: {', '.join(LANGUAGES.keys())}"
+        )
+    @staticmethod
+    def validate_api_key(api_key: str, provider: str) -> None:
+        """
+        Validate API key format
+        Args:
+            api_key: API key to validate
+            provider: API provider name
+        Raises:
+            ValidationError: If API key is invalid
+        """
+        if not api_key or not api_key.strip():
+            raise ValidationError("API key cannot be empty")
+        api_key = api_key.strip()
+        if provider == "ChatGPT":
+            if not api_key.startswith('sk-'):
+                raise ValidationError("OpenAI API key must start with 'sk-'")
+            if len(api_key) < 20:
+                raise ValidationError("OpenAI API key appears too short")
+        elif provider == "DeepSeek":
+            if len(api_key) < 10:
+                raise ValidationError("DeepSeek API key appears too short")
+        else:
+            raise ValidationError(f"Unknown provider: {provider}")
+    @staticmethod
+    def validate_translation_params(
+        source_lang: str,
+        target_lang: str,
+        api_provider: str,
+        api_key: str
+    ) -> tuple[str, str]:
+        """
+        Validate all translation parameters
+        Args:
+            source_lang: Source language
+            target_lang: Target language
+            api_provider: API provider name
+            api_key: API key
+        Returns:
+            Tuple of normalized (source_lang, target_lang)
+        Raises:
+            ValidationError: If any parameter is invalid
+        """
+        # Validate languages
+        norm_source = FileValidator.validate_language(source_lang)
+        norm_target = FileValidator.validate_language(target_lang)
+        if norm_source == norm_target:
+            raise ValidationError("Source and target languages cannot be the same")
+        # Validate API provider
+        if api_provider not in ["ChatGPT", "DeepSeek"]:
+            raise ValidationError(f"Unsupported API provider: {api_provider}")
+        # Validate API key
+        FileValidator.validate_api_key(api_key, api_provider)
+        return norm_source, norm_target
+    @staticmethod
+    def sanitize_filename(filename: str) -> str:
+        """
+        Sanitize filename for safe file operations
+        Args:
+            filename: Original filename
+        Returns:
+            Sanitized filename
+        """
+        # Remove or replace unsafe characters
+        unsafe_chars = '<>:"/\\|?*'
+        for char in unsafe_chars:
+            filename = filename.replace(char, '_')
+        # Remove leading/trailing spaces and dots
+        filename = filename.strip(' .')
+        # Ensure filename is not empty
+        if not filename:
+            filename = "translated_document"
+        return filename