Instructions to use adkta/nep_eng_code-mixed_translit_lm with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use adkta/nep_eng_code-mixed_translit_lm with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("automatic-speech-recognition", model="adkta/nep_eng_code-mixed_translit_lm")# Load model directly from transformers import AutoProcessor, AutoModelForCTC processor = AutoProcessor.from_pretrained("adkta/nep_eng_code-mixed_translit_lm") model = AutoModelForCTC.from_pretrained("adkta/nep_eng_code-mixed_translit_lm") - Notebooks
- Google Colab
- Kaggle
| library_name: transformers | |
| language: | |
| - ne | |
| - en | |
| datasets: | |
| - adkta/nep_eng_code-mixed_asr_dataset | |
| metrics: | |
| - wer | |
| results: | |
| - task: | |
| type: ASR | |
| metrics: | |
| - name: WER | |
| value: 21.83 | |
| # Model Card for Model ID | |
| <!-- Provide a quick summary of what the model is/does. --> | |
| ## Model Details | |
| ### Model Description | |
| This is an ASR/Speech to Text model for Nepali-English code-mixed speech. | |
| The model is wav2vec 2.0 with a CTC head. | |
| The transcripts are generated in native script i.e. Nepali in Devanagari script and English in Roman script. | |
| This model was created as part of thesis work in partial fulfillment of M.Sc. in Information and Communication Engineering offered in Pulchowk Campus, IOE, TU, Nepal. | |
| This is the best performing model with a WER of 21.83. | |
| The model is trained on Devanagarized version of the transcripts. | |
| The model makes use of LM during decoding. | |
| The native conversion is performed during post processing using a transliteration dictionary created as part of the thesis and a LM. | |
| For more details please reach out to the author by email stated at the end of this card. | |
| - **Developed by:** Ashish Devkota | |
| - **Model type:** ASR | |
| - **Language(s) (NLP):** Nepali-English Code-mixed | |
| - **License:** [More Information Needed] | |
| - **Finetuned from model [optional]:** wav2vec 2.0 | |
| ## Bias, Risks, and Limitations | |
| Only trained with ~2hrs of manually created Nepali-English Code-mixed ASR Dataset. The rest of the training dataset was automatically | |
| created using YouTube transcripts + Gemini and is not as accurate as the manually created dataset. Needs more manual dataset for training. | |
| ## How to Get Started with the Model | |
| Use the code below to get started with the model. | |
| Install dependencies: torch, torchcodec, torchaudio, flashlight-text (for CTC Decoder), huggingface transformers (to use this model): | |
| ``` | |
| !pip install torch | |
| !pip install torchcodec | |
| !pip install torchaudio | |
| !pip install flashlight-text | |
| !pip install transformers | |
| ``` | |
| Install KenLM (for using LM): | |
| ``` | |
| !git clone https://github.com/kpu/kenlm.git | |
| !sudo apt-get install libboost-all-dev --fix-missing # For colab | |
| %cd kenlm | |
| !mkdir -p build | |
| %cd build | |
| !cmake .. | |
| !make -j 1 | |
| %cd .. | |
| !export KENLM_ROOT=$PWD | |
| !export USE_CUDA=0 ## for cpu | |
| %cd .. | |
| %cd kenlm | |
| !pip install . | |
| %cd .. | |
| ``` | |
| Transliteration repo for disambiguation pipeline: | |
| ``` | |
| !pip install nepali-num2word | |
| !git clone https://github.com/adkta/nepali_arabic_num_to_word.git | |
| !python -m pip install -U symspellpy | |
| !rm -r /content/transliteration | |
| !git clone https://github.com/adkta/transliteration.git | |
| ``` | |
| Download LM for disambiguation: | |
| ``` | |
| !wget -L https://raw.githubusercontent.com/adkta/Devkota_2026_nep_eng_asr/main/disambiguation_lm.binary | |
| ``` | |
| Download Reduction Dictionary for disambiguation: | |
| ``` | |
| !wget -L https://raw.githubusercontent.com/adkta/transliteration/main/dictionaries/Nep_Eng_Code-Mixed_Reduct_Dict_Gemini.json | |
| ``` | |
| Download lexicon and language model for decoder (or train your own KenLM): | |
| ``` | |
| !wget -L https://raw.githubusercontent.com/adkta/Devkota_2026_nep_eng_asr/main/Indic_Combined_Translit_LM/lexicon.lst | |
| !wget -L https://raw.githubusercontent.com/adkta/Devkota_2026_nep_eng_asr/main/Indic_Combined_Translit_LM/lm.binary | |
| ``` | |
| Download audio of your choice (say 'test_audio.mp3'. Preprocess the input audio: | |
| ```python | |
| import torchaudio | |
| from torch import mean | |
| audio, sample_rate = torchaudio.load(filename) | |
| mono = audio if audio.shape[0] == 1 else mean(audio, dim=0, keepdim=True) | |
| resampled_mono = torchaudio.transforms.Resample(orig_freq=sample_rate, new_freq=16000)(mono) | |
| torchaudio.save(uri= './test_audio.mp3', src = resampled_mono, sample_rate = 16000 ) | |
| ``` | |
| Load the model | |
| ```python | |
| from transformers import AutoModelForCTC, AutoProcessor | |
| model_path = "adkta/nep_eng_code-mixed_translit_lm" #REPO | |
| model = AutoModelForCTC.from_pretrained(model_path) | |
| processor = AutoProcessor.from_pretrained(model_path) # Assuming you saved processor files too | |
| ``` | |
| Create CTC Decoder | |
| ```python | |
| vocab_dict = processor.tokenizer.get_vocab() | |
| sorted_vocab_list = [key for key, val in sorted(vocab_dict.items(), key=lambda item: item[1])] | |
| decoder = ctc_decoder( | |
| lexicon='./lexicon.lst', | |
| beam_size = 128, | |
| beam_size_token= 100, | |
| beam_threshold= 25.0, | |
| tokens=sorted_vocab_list, | |
| lm='./lm.binary', | |
| word_score=1.0, | |
| nbest=1, | |
| lm_weight = 2, | |
| blank_token = '<s>', | |
| sil_token = '|' | |
| ) | |
| ``` | |
| Generate hypothesis in Devanagari: | |
| ```python | |
| with torch.no_grad(): | |
| logits = model(resampled_mono).logits | |
| ctc_hypo = decoder(logits) | |
| print(ctc_hypo[0][0]) | |
| deva_text = " ".join(ctc_hypo[0][0].words) | |
| print(deva_text) | |
| ``` | |
| Convert to native format (Nepali in devanagari script, English in Roman script): | |
| ```python | |
| #conversion to native | |
| from transliteration.examples.disambiguation_examples import disambiguate | |
| from transliteration.utils import get_reverse_dict | |
| from transliteration.transliterator import TranslitDict | |
| import kenlm | |
| #LM | |
| LM_PATH = "./disambiguation_lm.binary" | |
| lang_model = kenlm.LanguageModel(LM_PATH) | |
| #Reverse-Reduction Dictionary | |
| reduc_dict_path = "./Nep_Eng_Code-Mixed_Reduct_Dict_Gemini.json" | |
| reverse_dict = get_reverse_dict(dictionary = TranslitDict.load(reduc_dict_path)) | |
| native_text = disambiguate(sentence=deva_text, model = lang_model, reverse_dict = reverse_dict, sym_spell = None, edit_dist = 0, lang_scoring = False, sep_case_plural = True) | |
| print(native_text) | |
| ``` | |
| ## Citation | |
| <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. --> | |
| **BibTeX:** | |
| ```bibtex | |
| @mastersthesis{nep_en_cm_asr_devkota_2026, | |
| author = "Devkota, Ashish", | |
| title = "Beyond Monolingual: Leveraging Multilingual Pre-trained Models for End-to-End Nepali-English Code-mixed Speech Recognition", | |
| school = "Tribhuvan University, Institute of Engineering, Pulchowk Campus", | |
| year = "2026", | |
| type = "M.Sc. Engg. Thesis", | |
| address = "Lalitpur, Nepal", | |
| month = "January" | |
| } | |
| Contact: Ashish Devkota <devkota.ashish@outlook.com> for report or any other details for this model. |