Nepali–English Bidirectional Speech Translation

EN ⇄ NE | ASR & Speech Translation

This repository contains a bidirectional English–Nepali speech recognition and translation system. The model enables speech-to-text recognition and speech translation in both directions (EN ⇄ NE).

Model Description

This project combines automatic speech recognition (ASR) and machine translation (MT) to support low-resource language speech translation.

ASR is performed using Wav2Vec2, converting raw audio into text.
Translation is handled by a Transformer-based encoder–decoder model.
The system supports:
- English → Nepali speech recognition & translation
- Nepali → English speech recognition & translation

The model was trained using the Hugging Face Transformers library with TensorFlow.

Intended Uses & Limitations

Intended Uses

Speech-to-text applications for English and Nepali
Speech translation systems
Research on low-resource language processing
Educational and academic projects

Limitations

Performance may degrade on noisy or accented speech
Limited by the size and quality of available Nepali datasets
Not optimized for real-time deployment on low-end devices

Training and Evaluation Data

The model was trained on curated English–Nepali speech and parallel text datasets. Due to dataset licensing constraints, exact dataset details are not publicly listed.

Training Procedure

The model was trained from scratch using a Transformer-based architecture.

Training Hyperparameters

Optimizer: AdamWeightDecay
Learning Rate: 2e-05
Weight Decay: 0.01
Precision: float32

Training Results

Train Loss	Validation Loss	Epoch
1.2241	1.2162	5
1.1792	1.1920	6
1.1424	1.1731	7
1.1101	1.1592	8
1.0812	1.1455	9

Framework Versions

Transformers: 4.48.3
TensorFlow: 2.18.0
Datasets: 3.3.2
Tokenizers: 0.21.0

Author

Developed as part of a speech AI project focused on English–Nepali bidirectional translation.

Downloads last month: -

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support