File size: 1,645 Bytes
c0a7b02 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 | ## π§ English to Spanish Translation AI Model
This repository contains a Transformer-based AI model fine-tuned for English to Spanish text translation. The model has been trained, quantized (FP16), and tested for quality and scoring. It delivers high-accuracy translations and is suitable for real-world use cases such as educational tools, real-time communication, and travel assistants.
---
## π Features
- π **Language Pair**: English β Spanish
- π§ **Model**: Helsinki-NLP/opus-mt-en-es
- π§ͺ **Quantized**: FP16 for efficient inference
- π― **High Accuracy**: Scored well on validation sets
- β‘ **CUDA Enabled**: Fast training and inference
---
## π Dataset Used
**Hugging Face Dataset**: **OscarNav/spa-eng**
- Source: OscarNav
- Language Pair: `en-es`
- Dataset Size: ~107K sentence pairs
```python
from datasets import load_dataset
dataset = load_dataset("OscarNav/spa-eng", lang1="en", lang2="es")
```
## π οΈ Model Training & Fine-Tuning
- Pretrained Base Model: Helsinki-NLP/opus-mt-en-es
- Tokenizer: AutoTokenizer from Hugging Face Transformers
- Training Environment: Kaggle Notebook with CUDA GPU
- Batch Size: 16
- Epochs: 3β5 (based on early stopping)
- Optimizer: AdamW
- Loss Function: CrossEntropyLoss
## π§ͺ Quantization (FP16)
Quantized the model for reduced memory usage and faster inference without compromising translation quality.
```python
model = model.half()
model.save_pretrained("quantized_model_fp16")
```
## β
Scoring
BLEU Score: ~34+
- Evaluation Metric: sacrebleu on validation set
- Inference Accuracy: Verified using real-world sample sentences
|