Nepali–English Bidirectional Speech Translation
EN ⇄ NE | ASR & Speech Translation
This repository contains a bidirectional English–Nepali speech recognition and translation system. The model enables speech-to-text recognition and speech translation in both directions (EN ⇄ NE).
Model Description
This project combines automatic speech recognition (ASR) and machine translation (MT) to support low-resource language speech translation.
- ASR is performed using Wav2Vec2, converting raw audio into text.
- Translation is handled by a Transformer-based encoder–decoder model.
- The system supports:
- English → Nepali speech recognition & translation
- Nepali → English speech recognition & translation
The model was trained using the Hugging Face Transformers library with TensorFlow.
Intended Uses & Limitations
Intended Uses
- Speech-to-text applications for English and Nepali
- Speech translation systems
- Research on low-resource language processing
- Educational and academic projects
Limitations
- Performance may degrade on noisy or accented speech
- Limited by the size and quality of available Nepali datasets
- Not optimized for real-time deployment on low-end devices
Training and Evaluation Data
The model was trained on curated English–Nepali speech and parallel text datasets. Due to dataset licensing constraints, exact dataset details are not publicly listed.
Training Procedure
The model was trained from scratch using a Transformer-based architecture.
Training Hyperparameters
- Optimizer: AdamWeightDecay
- Learning Rate: 2e-05
- Weight Decay: 0.01
- Precision: float32
Training Results
| Train Loss | Validation Loss | Epoch |
|---|---|---|
| 1.2241 | 1.2162 | 5 |
| 1.1792 | 1.1920 | 6 |
| 1.1424 | 1.1731 | 7 |
| 1.1101 | 1.1592 | 8 |
| 1.0812 | 1.1455 | 9 |
Framework Versions
- Transformers: 4.48.3
- TensorFlow: 2.18.0
- Datasets: 3.3.2
- Tokenizers: 0.21.0
Author
Developed as part of a speech AI project focused on English–Nepali bidirectional translation.
- Downloads last month
- 10