MMS Urmi ASR (adapter fine-tune, checkpoint step 8000)

Speech recognition (CTC) for Neo-Aramaic (Assyrian / Syriac tradition), Urmi (Urmia) Christian dialect. Hub metadata uses language code aii (Assyrian Neo-Aramaic, ISO 639-3), not the non-ISO label urmi. Fine-tuned from facebook/mms-1b-all. Training kept the wav2vec2 encoder frozen and updated MMS attention adapters and the CTC head only.

Metrics (dev split at best checkpoint)

CER: 0.236
WER: 0.748

See MODEL_INFO.json in this repo for full training metadata.

Requirements

transformers ≥ 4.30 (config lists 5.7.0)
torch, soundfile or similar for audio I/O

Load and transcribe

import torch
from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
import soundfile as sf

model_id = "<YOUR_USERNAME>/<THIS_REPO_NAME>"  # after upload

processor = Wav2Vec2Processor.from_pretrained(model_id)
model = Wav2Vec2ForCTC.from_pretrained(model_id)
model.eval()

audio, sr = sf.read("audio.wav")
if audio.ndim > 1:
    audio = audio.mean(axis=1)
if sr != 16000:
    raise ValueError(f"Expected 16 kHz mono WAV; got sr={sr}")

inputs = processor(audio, sampling_rate=16000, return_tensors="pt", padding=True)
with torch.no_grad():
    logits = model(inputs.input_values).logits
pred_ids = torch.argmax(logits, dim=-1)
text = processor.batch_decode(pred_ids)[0]
print(text)

Local folder after clone/download works the same: pass the directory path instead of model_id.

Base model

This checkpoint was trained from facebook/mms-1b-all with adapter-only updates; the saved config.json and weights in this repo are sufficient for Wav2Vec2ForCTC.from_pretrained without loading the base separately.

Downloads last month: 24

Safetensors

Model size

1.0B params

Tensor type

F32

Model tree for Selest/MMS_urmi_ASR_model_adapters-only_bad-results

Base model

facebook/mms-1b-all

Finetuned

(385)

this model