TorongoXetu - Assamese ASR Model
torongoXetu-asr is an Automatic Speech Recognition (ASR) model built specifically for the Assamese language. Built on the NVIDIA NeMo framework with a Conformer architecture, it delivers Speech-to-Text transcription for Assamese audio.
A lightweight Python library torongoxetu is included for inference.
pip install torongoxetu
Live Demo
https://huggingface.co/spaces/ananddey/torongoXetu-asr
Model Overview
| Attribute | Details |
|---|---|
| Language | Assamese (as) |
| Architecture | Conformer |
| Tokenizer | BPE |
| Training Data | Assamese ASR Dataset (~135 hours) |
| Hardware | NVIDIA L40s GPU |
| Training Time | ~12 hours |
Training Metrics
Getting Started
Prerequisites
- Python 3.10+
- Virtual environment (recommended)
Installation
Important: Follow these steps in order to avoid dependency conflicts.
# Create and activate a virtual environment
python -m venv venv
venv\Scripts\activate # Windows
# source venv/bin/activate # Linux/Mac
# Step 1: Install NeMo fork first
pip install git+https://github.com/AI4Bharat/NeMo.git --no-deps
# Step 2: Install core dependencies
pip install -r requirements.txt
Usage
Quick Test
Run the included test script to verify everything works:
python inference.py
This transcribes the sample test.wav file and prints the Assamese text from the audio.
Python API
from torongoxetu import TorongoModel
# Load model
model = TorongoModel("torongoXetu-asr.nemo")
# Single file transcription
text = model.transcribe("audio.wav")
print(text)
# Batch transcription
texts = model.transcribe(["file1.wav", "file2.wav"], batch_size=4)
print(texts)
Web Demo (Local)
Launch the interactive web interface:
python app.py
Open the URL shown in terminal ( http://127.0.0.1:7860). You can upload audio files, record directly, or try the included samples.
Use Cases
- Speech-to-text applications for Assamese
- Voice assistants and transcription services
- Research and academic projects
- Subtitle generation and build asr tools
Limitations
- Specifically for Assamese only, other languages may not work well
License
MIT License
Author
Anand Dey
📧 ananddey.nic@gmail.com
- Downloads last month
- 6