AventIQ-AI
/

English-To-Spanish

Model card Files Files and versions

English-To-Spanish / README.md

DeepakKumarMSL's picture

Create README.md

c0a7b02 verified 8 months ago

|

history blame contribute delete

1.65 kB

	## 🧠 English to Spanish Translation AI Model

	This repository contains a Transformer-based AI model fine-tuned for English to Spanish text translation. The model has been trained, quantized (FP16), and tested for quality and scoring. It delivers high-accuracy translations and is suitable for real-world use cases such as educational tools, real-time communication, and travel assistants.

	---

	## 🚀 Features

	- 🔁 Language Pair: English → Spanish
	- 🔧 Model: Helsinki-NLP/opus-mt-en-es
	- 🧪 Quantized: FP16 for efficient inference
	- 🎯 High Accuracy: Scored well on validation sets
	- ⚡ CUDA Enabled: Fast training and inference

	---

	## 📊 Dataset Used

	Hugging Face Dataset: OscarNav/spa-eng

	- Source: OscarNav
	- Language Pair: `en-es`
	- Dataset Size: ~107K sentence pairs

	```python
	from datasets import load_dataset

	dataset = load_dataset("OscarNav/spa-eng", lang1="en", lang2="es")
	```

	## 🛠️ Model Training & Fine-Tuning

	- Pretrained Base Model: Helsinki-NLP/opus-mt-en-es

	- Tokenizer: AutoTokenizer from Hugging Face Transformers

	- Training Environment: Kaggle Notebook with CUDA GPU

	- Batch Size: 16

	- Epochs: 3–5 (based on early stopping)

	- Optimizer: AdamW

	- Loss Function: CrossEntropyLoss


	## 🧪 Quantization (FP16)

	Quantized the model for reduced memory usage and faster inference without compromising translation quality.

	```python
	model = model.half()
	model.save_pretrained("quantized_model_fp16")
	```

	## ✅ Scoring
	BLEU Score: ~34+

	- Evaluation Metric: sacrebleu on validation set

	- Inference Accuracy: Verified using real-world sample sentences