TorongoXetu - Assamese ASR Model

torongoXetu-asr is an Automatic Speech Recognition (ASR) model built specifically for the Assamese language. Built on the NVIDIA NeMo framework with a Conformer architecture, it delivers Speech-to-Text transcription for Assamese audio.

A lightweight Python library torongoxetu is included for inference.

pip install torongoxetu 

Live Demo

https://huggingface.co/spaces/ananddey/torongoXetu-asr


Model Overview

Attribute Details
Language Assamese (as)
Architecture Conformer
Tokenizer BPE
Training Data Assamese ASR Dataset (~135 hours)
Hardware NVIDIA L40s GPU
Training Time ~12 hours

Training Metrics

Training Graph (WandB)


Getting Started

Prerequisites

  • Python 3.10+
  • Virtual environment (recommended)

Installation

Important: Follow these steps in order to avoid dependency conflicts.

# Create and activate a virtual environment
python -m venv venv
venv\Scripts\activate        # Windows
# source venv/bin/activate   # Linux/Mac

# Step 1: Install NeMo fork first
pip install git+https://github.com/AI4Bharat/NeMo.git --no-deps


# Step 2: Install core dependencies
pip install -r requirements.txt

Usage

Quick Test

Run the included test script to verify everything works:

python inference.py

This transcribes the sample test.wav file and prints the Assamese text from the audio.

Python API

from torongoxetu import TorongoModel

# Load model
model = TorongoModel("torongoXetu-asr.nemo")

# Single file transcription
text = model.transcribe("audio.wav")
print(text)

# Batch transcription
texts = model.transcribe(["file1.wav", "file2.wav"], batch_size=4)
print(texts)

Web Demo (Local)

Launch the interactive web interface:

python app.py

Open the URL shown in terminal ( http://127.0.0.1:7860). You can upload audio files, record directly, or try the included samples.


Use Cases

  • Speech-to-text applications for Assamese
  • Voice assistants and transcription services
  • Research and academic projects
  • Subtitle generation and build asr tools

Limitations

  • Specifically for Assamese only, other languages may not work well

License

MIT License


Author

Anand Dey
📧 ananddey.nic@gmail.com

Downloads last month
6
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train ananddey/torongoXetu-asr

Space using ananddey/torongoXetu-asr 1