File size: 1,645 Bytes

c0a7b02

## 🧠 English to Spanish Translation AI Model

This repository contains a Transformer-based AI model fine-tuned for English to Spanish text translation. The model has been trained, quantized (FP16), and tested for quality and scoring. It delivers high-accuracy translations and is suitable for real-world use cases such as educational tools, real-time communication, and travel assistants.

---

## 🚀 Features

- 🔁 **Language Pair**: English → Spanish
- 🔧 **Model**: Helsinki-NLP/opus-mt-en-es
- 🧪 **Quantized**: FP16 for efficient inference
- 🎯 **High Accuracy**: Scored well on validation sets
- ⚡ **CUDA Enabled**: Fast training and inference

---

## 📊 Dataset Used

**Hugging Face Dataset**: **OscarNav/spa-eng**

- Source: OscarNav
- Language Pair: `en-es`
- Dataset Size: ~107K sentence pairs

```python
from datasets import load_dataset

dataset = load_dataset("OscarNav/spa-eng", lang1="en", lang2="es")
```

## 🛠️ Model Training & Fine-Tuning

 - Pretrained Base Model: Helsinki-NLP/opus-mt-en-es

 - Tokenizer: AutoTokenizer from Hugging Face Transformers

 - Training Environment: Kaggle Notebook with CUDA GPU

 - Batch Size: 16

 - Epochs: 3–5 (based on early stopping)

 - Optimizer: AdamW

 - Loss Function: CrossEntropyLoss


## 🧪 Quantization (FP16)

Quantized the model for reduced memory usage and faster inference without compromising translation quality.

```python
model = model.half() 
model.save_pretrained("quantized_model_fp16")
```

## ✅ Scoring
BLEU Score: ~34+

 - Evaluation Metric: sacrebleu on validation set

 - Inference Accuracy: Verified using real-world sample sentences