MedAI_Processing / vi /README.md
LiamKhoaLe's picture
Upd vietnamese transl
1d46eb9
# Vietnamese Translation Module
This module provides Vietnamese translation functionality for the MedAI Processing application using the Helsinki-NLP/opus-mt-en-vi model.
## Features
- **English to Vietnamese Translation**: Translates English text to Vietnamese using the Helsinki-NLP/opus-mt-en-vi model
- **Batch Processing**: Efficiently translates multiple texts at once
- **Dictionary Translation**: Translates specific fields in data dictionaries
- **Integration**: Seamlessly integrates with both SFT and RAG processing workflows
- **Error Handling**: Graceful fallback to original text if translation fails
- **Logging**: Comprehensive logging for debugging and monitoring
## Configuration
Add the following environment variable to your `.env` file:
```bash
EN_VI=Helsinki-NLP/opus-mt-en-vi
```
## Usage
### Basic Translation
```python
from vi.translator import VietnameseTranslator
# Initialize translator
translator = VietnameseTranslator()
# Load the model
translator.load_model()
# Translate single text
translated = translator.translate_text("Hello, how are you?")
# Translate batch of texts
texts = ["Text 1", "Text 2", "Text 3"]
translated_batch = translator.translate_batch(texts)
```
### Dictionary Translation
```python
# Translate specific fields in a dictionary
data = {
"instruction": "Answer the question",
"input": "What is diabetes?",
"output": "Diabetes is a metabolic disorder..."
}
translated_data = translator.translate_dict(data, ["instruction", "input", "output"])
```
## Integration
The translation functionality is automatically integrated into the processing workflows:
1. **UI Toggle**: Users can enable Vietnamese translation via the checkbox in the web interface
2. **SFT Processing**: All text fields in SFT format are translated when enabled
3. **RAG Processing**: All text fields in RAG format are translated when enabled
4. **Metadata**: Translated rows are marked with `vietnamese_translated: true` in metadata
## Model Information
- **Model**: Helsinki-NLP/opus-mt-en-vi
- **Source Language**: English
- **Target Language**: Vietnamese
- **BLEU Score**: 37.2
- **chrF Score**: 0.542
- **License**: Apache 2.0
## Testing
Run the test script to verify translation functionality:
```bash
python test_translation.py
```
## Files
- `translator.py`: Main translation class
- `download.py`: Model download script for Docker
- `processing_utils.py`: Utility functions for processing integration
- `__init__.py`: Module initialization
- `README.md`: This documentation
## Notes
- The model is automatically downloaded during Docker build
- Translation is performed on the CPU by default, but can use GPU if available
- The model requires the target language token `>>vie<<` for proper translation
- All translation operations include comprehensive error handling and logging