File size: 1,645 Bytes
c0a7b02
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
## 🧠 English to Spanish Translation AI Model

This repository contains a Transformer-based AI model fine-tuned for English to Spanish text translation. The model has been trained, quantized (FP16), and tested for quality and scoring. It delivers high-accuracy translations and is suitable for real-world use cases such as educational tools, real-time communication, and travel assistants.

---

## πŸš€ Features

- πŸ” **Language Pair**: English β†’ Spanish
- πŸ”§ **Model**: Helsinki-NLP/opus-mt-en-es
- πŸ§ͺ **Quantized**: FP16 for efficient inference
- 🎯 **High Accuracy**: Scored well on validation sets
- ⚑ **CUDA Enabled**: Fast training and inference

---

## πŸ“Š Dataset Used

**Hugging Face Dataset**: **OscarNav/spa-eng**

- Source: OscarNav
- Language Pair: `en-es`
- Dataset Size: ~107K sentence pairs

```python
from datasets import load_dataset

dataset = load_dataset("OscarNav/spa-eng", lang1="en", lang2="es")
```

## πŸ› οΈ Model Training & Fine-Tuning

 - Pretrained Base Model: Helsinki-NLP/opus-mt-en-es

 - Tokenizer: AutoTokenizer from Hugging Face Transformers

 - Training Environment: Kaggle Notebook with CUDA GPU

 - Batch Size: 16

 - Epochs: 3–5 (based on early stopping)

 - Optimizer: AdamW

 - Loss Function: CrossEntropyLoss


## πŸ§ͺ Quantization (FP16)

Quantized the model for reduced memory usage and faster inference without compromising translation quality.

```python
model = model.half() 
model.save_pretrained("quantized_model_fp16")
```

## βœ… Scoring
BLEU Score: ~34+

 - Evaluation Metric: sacrebleu on validation set

 - Inference Accuracy: Verified using real-world sample sentences