Instructions to use adkta/nep_eng_code-mixed_translit_lm with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use adkta/nep_eng_code-mixed_translit_lm with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("automatic-speech-recognition", model="adkta/nep_eng_code-mixed_translit_lm")# Load model directly from transformers import AutoProcessor, AutoModelForCTC processor = AutoProcessor.from_pretrained("adkta/nep_eng_code-mixed_translit_lm") model = AutoModelForCTC.from_pretrained("adkta/nep_eng_code-mixed_translit_lm") - Notebooks
- Google Colab
- Kaggle
Model Card for Model ID
Model Details
Model Description
This is an ASR/Speech to Text model for Nepali-English code-mixed speech. The model is wav2vec 2.0 with a CTC head. The transcripts are generated in native script i.e. Nepali in Devanagari script and English in Roman script. This model was created as part of thesis work in partial fulfillment of M.Sc. in Information and Communication Engineering offered in Pulchowk Campus, IOE, TU, Nepal. This is the best performing model with a WER of 21.83. The model is trained on Devanagarized version of the transcripts. The model makes use of LM during decoding. The native conversion is performed during post processing using a transliteration dictionary created as part of the thesis and a LM. For more details please reach out to the author by email stated at the end of this card.
- Developed by: Ashish Devkota
- Model type: ASR
- Language(s) (NLP): Nepali-English Code-mixed
- License: [More Information Needed]
- Finetuned from model [optional]: wav2vec 2.0
Bias, Risks, and Limitations
Only trained with ~2hrs of manually created Nepali-English Code-mixed ASR Dataset. The rest of the training dataset was automatically created using YouTube transcripts + Gemini and is not as accurate as the manually created dataset. Needs more manual dataset for training.
How to Get Started with the Model
Use the code below to get started with the model. Install dependencies: torch, torchcodec, torchaudio, flashlight-text (for CTC Decoder), huggingface transformers (to use this model):
!pip install torch
!pip install torchcodec
!pip install torchaudio
!pip install flashlight-text
!pip install transformers
Install KenLM (for using LM):
!git clone https://github.com/kpu/kenlm.git
!sudo apt-get install libboost-all-dev --fix-missing # For colab
%cd kenlm
!mkdir -p build
%cd build
!cmake ..
!make -j 1
%cd ..
!export KENLM_ROOT=$PWD
!export USE_CUDA=0 ## for cpu
%cd ..
%cd kenlm
!pip install .
%cd ..
Transliteration repo for disambiguation pipeline:
!pip install nepali-num2word
!git clone https://github.com/adkta/nepali_arabic_num_to_word.git
!python -m pip install -U symspellpy
!rm -r /content/transliteration
!git clone https://github.com/adkta/transliteration.git
Download LM for disambiguation:
!wget -L https://raw.githubusercontent.com/adkta/Devkota_2026_nep_eng_asr/main/disambiguation_lm.binary
Download Reduction Dictionary for disambiguation:
!wget -L https://raw.githubusercontent.com/adkta/transliteration/main/dictionaries/Nep_Eng_Code-Mixed_Reduct_Dict_Gemini.json
Download lexicon and language model for decoder (or train your own KenLM):
!wget -L https://raw.githubusercontent.com/adkta/Devkota_2026_nep_eng_asr/main/Indic_Combined_Translit_LM/lexicon.lst
!wget -L https://raw.githubusercontent.com/adkta/Devkota_2026_nep_eng_asr/main/Indic_Combined_Translit_LM/lm.binary
Download audio of your choice (say 'test_audio.mp3'. Preprocess the input audio:
import torchaudio
from torch import mean
audio, sample_rate = torchaudio.load(filename)
mono = audio if audio.shape[0] == 1 else mean(audio, dim=0, keepdim=True)
resampled_mono = torchaudio.transforms.Resample(orig_freq=sample_rate, new_freq=16000)(mono)
torchaudio.save(uri= './test_audio.mp3', src = resampled_mono, sample_rate = 16000 )
Load the model
from transformers import AutoModelForCTC, AutoProcessor
model_path = "adkta/nep_eng_code-mixed_translit_lm" #REPO
model = AutoModelForCTC.from_pretrained(model_path)
processor = AutoProcessor.from_pretrained(model_path) # Assuming you saved processor files too
Create CTC Decoder
vocab_dict = processor.tokenizer.get_vocab()
sorted_vocab_list = [key for key, val in sorted(vocab_dict.items(), key=lambda item: item[1])]
decoder = ctc_decoder(
lexicon='./lexicon.lst',
beam_size = 128,
beam_size_token= 100,
beam_threshold= 25.0,
tokens=sorted_vocab_list,
lm='./lm.binary',
word_score=1.0,
nbest=1,
lm_weight = 2,
blank_token = '<s>',
sil_token = '|'
)
Generate hypothesis in Devanagari:
with torch.no_grad():
logits = model(resampled_mono).logits
ctc_hypo = decoder(logits)
print(ctc_hypo[0][0])
deva_text = " ".join(ctc_hypo[0][0].words)
print(deva_text)
Convert to native format (Nepali in devanagari script, English in Roman script):
#conversion to native
from transliteration.examples.disambiguation_examples import disambiguate
from transliteration.utils import get_reverse_dict
from transliteration.transliterator import TranslitDict
import kenlm
#LM
LM_PATH = "./disambiguation_lm.binary"
lang_model = kenlm.LanguageModel(LM_PATH)
#Reverse-Reduction Dictionary
reduc_dict_path = "./Nep_Eng_Code-Mixed_Reduct_Dict_Gemini.json"
reverse_dict = get_reverse_dict(dictionary = TranslitDict.load(reduc_dict_path))
native_text = disambiguate(sentence=deva_text, model = lang_model, reverse_dict = reverse_dict, sym_spell = None, edit_dist = 0, lang_scoring = False, sep_case_plural = True)
print(native_text)
Citation
BibTeX:
@mastersthesis{nep_en_cm_asr_devkota_2026,
author = "Devkota, Ashish",
title = "Beyond Monolingual: Leveraging Multilingual Pre-trained Models for End-to-End Nepali-English Code-mixed Speech Recognition",
school = "Tribhuvan University, Institute of Engineering, Pulchowk Campus",
year = "2026",
type = "M.Sc. Engg. Thesis",
address = "Lalitpur, Nepal",
month = "January"
}
Contact: Ashish Devkota <devkota.ashish@outlook.com> for report or any other details for this model.
- Downloads last month
- 15