English-To-Spanish / README.md
DeepakKumarMSL's picture
Create README.md
c0a7b02 verified
## 🧠 English to Spanish Translation AI Model
This repository contains a Transformer-based AI model fine-tuned for English to Spanish text translation. The model has been trained, quantized (FP16), and tested for quality and scoring. It delivers high-accuracy translations and is suitable for real-world use cases such as educational tools, real-time communication, and travel assistants.
---
## πŸš€ Features
- πŸ” **Language Pair**: English β†’ Spanish
- πŸ”§ **Model**: Helsinki-NLP/opus-mt-en-es
- πŸ§ͺ **Quantized**: FP16 for efficient inference
- 🎯 **High Accuracy**: Scored well on validation sets
- ⚑ **CUDA Enabled**: Fast training and inference
---
## πŸ“Š Dataset Used
**Hugging Face Dataset**: **OscarNav/spa-eng**
- Source: OscarNav
- Language Pair: `en-es`
- Dataset Size: ~107K sentence pairs
```python
from datasets import load_dataset
dataset = load_dataset("OscarNav/spa-eng", lang1="en", lang2="es")
```
## πŸ› οΈ Model Training & Fine-Tuning
- Pretrained Base Model: Helsinki-NLP/opus-mt-en-es
- Tokenizer: AutoTokenizer from Hugging Face Transformers
- Training Environment: Kaggle Notebook with CUDA GPU
- Batch Size: 16
- Epochs: 3–5 (based on early stopping)
- Optimizer: AdamW
- Loss Function: CrossEntropyLoss
## πŸ§ͺ Quantization (FP16)
Quantized the model for reduced memory usage and faster inference without compromising translation quality.
```python
model = model.half()
model.save_pretrained("quantized_model_fp16")
```
## βœ… Scoring
BLEU Score: ~34+
- Evaluation Metric: sacrebleu on validation set
- Inference Accuracy: Verified using real-world sample sentences