# Vietnamese Translation Module This module provides Vietnamese translation functionality for the MedAI Processing application using the Helsinki-NLP/opus-mt-en-vi model. ## Features - **English to Vietnamese Translation**: Translates English text to Vietnamese using the Helsinki-NLP/opus-mt-en-vi model - **Batch Processing**: Efficiently translates multiple texts at once - **Dictionary Translation**: Translates specific fields in data dictionaries - **Integration**: Seamlessly integrates with both SFT and RAG processing workflows - **Error Handling**: Graceful fallback to original text if translation fails - **Logging**: Comprehensive logging for debugging and monitoring ## Configuration Add the following environment variable to your `.env` file: ```bash EN_VI=Helsinki-NLP/opus-mt-en-vi ``` ## Usage ### Basic Translation ```python from vi.translator import VietnameseTranslator # Initialize translator translator = VietnameseTranslator() # Load the model translator.load_model() # Translate single text translated = translator.translate_text("Hello, how are you?") # Translate batch of texts texts = ["Text 1", "Text 2", "Text 3"] translated_batch = translator.translate_batch(texts) ``` ### Dictionary Translation ```python # Translate specific fields in a dictionary data = { "instruction": "Answer the question", "input": "What is diabetes?", "output": "Diabetes is a metabolic disorder..." } translated_data = translator.translate_dict(data, ["instruction", "input", "output"]) ``` ## Integration The translation functionality is automatically integrated into the processing workflows: 1. **UI Toggle**: Users can enable Vietnamese translation via the checkbox in the web interface 2. **SFT Processing**: All text fields in SFT format are translated when enabled 3. **RAG Processing**: All text fields in RAG format are translated when enabled 4. **Metadata**: Translated rows are marked with `vietnamese_translated: true` in metadata ## Model Information - **Model**: Helsinki-NLP/opus-mt-en-vi - **Source Language**: English - **Target Language**: Vietnamese - **BLEU Score**: 37.2 - **chrF Score**: 0.542 - **License**: Apache 2.0 ## Testing Run the test script to verify translation functionality: ```bash python test_translation.py ``` ## Files - `translator.py`: Main translation class - `download.py`: Model download script for Docker - `processing_utils.py`: Utility functions for processing integration - `__init__.py`: Module initialization - `README.md`: This documentation ## Notes - The model is automatically downloaded during Docker build - Translation is performed on the CPU by default, but can use GPU if available - The model requires the target language token `>>vie<<` for proper translation - All translation operations include comprehensive error handling and logging