Spaces:
Sleeping
Sleeping
Vietnamese Translation Module
This module provides Vietnamese translation functionality for the MedAI Processing application using the Helsinki-NLP/opus-mt-en-vi model.
Features
- English to Vietnamese Translation: Translates English text to Vietnamese using the Helsinki-NLP/opus-mt-en-vi model
- Batch Processing: Efficiently translates multiple texts at once
- Dictionary Translation: Translates specific fields in data dictionaries
- Integration: Seamlessly integrates with both SFT and RAG processing workflows
- Error Handling: Graceful fallback to original text if translation fails
- Logging: Comprehensive logging for debugging and monitoring
Configuration
Add the following environment variable to your .env file:
EN_VI=Helsinki-NLP/opus-mt-en-vi
Usage
Basic Translation
from vi.translator import VietnameseTranslator
# Initialize translator
translator = VietnameseTranslator()
# Load the model
translator.load_model()
# Translate single text
translated = translator.translate_text("Hello, how are you?")
# Translate batch of texts
texts = ["Text 1", "Text 2", "Text 3"]
translated_batch = translator.translate_batch(texts)
Dictionary Translation
# Translate specific fields in a dictionary
data = {
"instruction": "Answer the question",
"input": "What is diabetes?",
"output": "Diabetes is a metabolic disorder..."
}
translated_data = translator.translate_dict(data, ["instruction", "input", "output"])
Integration
The translation functionality is automatically integrated into the processing workflows:
- UI Toggle: Users can enable Vietnamese translation via the checkbox in the web interface
- SFT Processing: All text fields in SFT format are translated when enabled
- RAG Processing: All text fields in RAG format are translated when enabled
- Metadata: Translated rows are marked with
vietnamese_translated: truein metadata
Model Information
- Model: Helsinki-NLP/opus-mt-en-vi
- Source Language: English
- Target Language: Vietnamese
- BLEU Score: 37.2
- chrF Score: 0.542
- License: Apache 2.0
Testing
Run the test script to verify translation functionality:
python test_translation.py
Files
translator.py: Main translation classdownload.py: Model download script for Dockerprocessing_utils.py: Utility functions for processing integration__init__.py: Module initializationREADME.md: This documentation
Notes
- The model is automatically downloaded during Docker build
- Translation is performed on the CPU by default, but can use GPU if available
- The model requires the target language token
>>vie<<for proper translation - All translation operations include comprehensive error handling and logging