| ## π§ English to Spanish Translation AI Model | |
| This repository contains a Transformer-based AI model fine-tuned for English to Spanish text translation. The model has been trained, quantized (FP16), and tested for quality and scoring. It delivers high-accuracy translations and is suitable for real-world use cases such as educational tools, real-time communication, and travel assistants. | |
| --- | |
| ## π Features | |
| - π **Language Pair**: English β Spanish | |
| - π§ **Model**: Helsinki-NLP/opus-mt-en-es | |
| - π§ͺ **Quantized**: FP16 for efficient inference | |
| - π― **High Accuracy**: Scored well on validation sets | |
| - β‘ **CUDA Enabled**: Fast training and inference | |
| --- | |
| ## π Dataset Used | |
| **Hugging Face Dataset**: **OscarNav/spa-eng** | |
| - Source: OscarNav | |
| - Language Pair: `en-es` | |
| - Dataset Size: ~107K sentence pairs | |
| ```python | |
| from datasets import load_dataset | |
| dataset = load_dataset("OscarNav/spa-eng", lang1="en", lang2="es") | |
| ``` | |
| ## π οΈ Model Training & Fine-Tuning | |
| - Pretrained Base Model: Helsinki-NLP/opus-mt-en-es | |
| - Tokenizer: AutoTokenizer from Hugging Face Transformers | |
| - Training Environment: Kaggle Notebook with CUDA GPU | |
| - Batch Size: 16 | |
| - Epochs: 3β5 (based on early stopping) | |
| - Optimizer: AdamW | |
| - Loss Function: CrossEntropyLoss | |
| ## π§ͺ Quantization (FP16) | |
| Quantized the model for reduced memory usage and faster inference without compromising translation quality. | |
| ```python | |
| model = model.half() | |
| model.save_pretrained("quantized_model_fp16") | |
| ``` | |
| ## β Scoring | |
| BLEU Score: ~34+ | |
| - Evaluation Metric: sacrebleu on validation set | |
| - Inference Accuracy: Verified using real-world sample sentences | |