---
base_model: "None"
language:
- en
- ar
license: mit
tags:
- translation
- seq2seq
metrics:
- accuracy
---

# π LinguaFlow
### *Advanced English-to-Arabic Neural Machine Translation*
[](https://opensource.org/licenses/MIT)
[](https://www.python.org/)
[](https://tensorflow.org/)
[](https://huggingface.co/Ali0044/LinguaFlow)
---
## π Overview
**LinguaFlow** is a robust Sequence-to-Sequence (Seq2Seq) neural machine translation model specialized in converting English text into Arabic. Leveraging a deep learning architecture based on **LSTM (Long Short-Term Memory)**, it captures complex linguistic relationships and contextual nuances to provide high-quality translations for short-to-medium length sentences.
### β¨ Key Features
- π **LSTM-Based Architecture**: High-efficiency encoder-decoder framework.
- π― **Domain Specificity**: Optimized for the `salehalmansour/english-to-arabic-translate` dataset.
- π οΈ **Easy Integration**: Simple Python API for quick deployment.
- π **Bilingual Support**: Full English-to-Arabic vocabulary coverage (En: 6,400+ | Ar: 9,600+).
---
## ποΈ Technical Architecture
The model employs an **Encoder-Decoder** topology designed for sequence transduction tasks.
```mermaid
graph LR
A[English Input Sequence] --> B[Embedding Layer]
B --> C[LSTM Encoder]
C --> D[Context Vector]
D --> E[Repeat Vector]
E --> F[LSTM Decoder]
F --> G[Dense Layer / Softmax]
G --> H[Arabic Output Sequence]
```
### Configuration Highlights
| Component | Specification |
| :--- | :--- |
| **Model Type** | Seq2Seq LSTM |
| **Hidden Units** | 512 |
| **Embedding Size** | 512 |
| **Input Depth** | 20 Timesteps |
| **Output Depth** | 20 Timesteps |
| **Optimizer** | Adam |
| **Loss Function** | Sparse Categorical Crossentropy |
---
## π Performance Benchmark
LinguaFlow demonstrates strong generalization capabilities on the validation set after extensive training.
| Metric | Training | Validation |
| :--- | :--- | :--- |
| **Accuracy** | 85.99% | 85.74% |
| **Loss** | 0.9594 | 1.1926 |
---
## π Getting Started
### Prerequisites
```bash
pip install tensorflow numpy pandas scikit-learn huggingface_hub
```
### Usage Example
```python
from huggingface_hub import snapshot_download
import tensorflow as tf
import numpy as np
import os
import pickle
from tensorflow.keras.preprocessing.sequence import pad_sequences
# 1. Download model and tokenizers
repo_id = "Ali0044/LinguaFlow"
local_dir = snapshot_download(repo_id=repo_id)
# 2. Load resources
model = tf.keras.models.load_model(os.path.join(local_dir, "Translation_model.keras"))
with open(os.path.join(local_dir, "eng_tokenizer.pkl"), "rb") as f:
eng_tokenizer = pickle.load(f)
with open(os.path.join(local_dir, "ar_tokenizer.pkl"), "rb") as f:
ar_tokenizer = pickle.load(f)
# 3. Translation Function
def translate(sentences):
# Clean and tokenize
seq = eng_tokenizer.texts_to_sequences(sentences)
# Pad sequences
padded = pad_sequences(seq, maxlen=20, padding='post')
# Predict
preds = model.predict(padded)
preds = np.argmax(preds, axis=-1)
results = []
for s in preds:
text = [ar_tokenizer.index_word[i] for i in s if i != 0]
results.append(' '.join(text))
return results
# 4. Try it out!
print(translate(["Hello, how are you?"]))
```
---
## β οΈ Limitations & Ethical Notes
- **Maximum Length**: Best results are achieved with sentences up to 20 words.
- **Domain Bias**: Accuracy may vary when translating specialized technical or medical jargon not present in the training set.
- **Bias**: As with all language models, potential biases in the open-source dataset may occasionally be reflected in translations.
---
## πΊοΈ Roadmap
- [ ] Implement Attention Mechanism (Bahdanau/Luong).
- [ ] Upgrade to Transformer architecture (Base/Large).
- [ ] Expand sequence length support to 50+ tokens.
- [ ] Continuous training on larger Arabic datasets (e.g., OPUS).
---
## π€ Contributing
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
## π License
This project is licensed under the **MIT License** - see the [LICENSE](LICENSE) file for details.
---