Wav2Vec2-BERT CTC (speed perturbation)
Fine-tuned Wav2Vec2BertForCTC on Neo-Aramaic (Assyrian / Syriac tradition) — Urmi (Urmia) ASR data. Hub language: aii (Assyrian Neo-Aramaic). Base checkpoint: facebook/w2v-bert-2.0.
Input: log-mel features via SeamlessM4TFeatureExtractor (see preprocessor_config.json), sampling rate 16000 Hz.
Веса в репозитории для выгрузки: model.safetensors (копия из best_model).
Usage
import torch
from transformers import Wav2Vec2BertForCTC, Wav2Vec2BertProcessor
import soundfile as sf
model_id = "<YOUR_HF_USERNAME>/<YOUR_REPO_NAME>"
processor = Wav2Vec2BertProcessor.from_pretrained(model_id)
model = Wav2Vec2BertForCTC.from_pretrained(model_id)
audio, sr = sf.read("audio.wav", dtype="float32")
if sr != 16000:
raise ValueError(f"Expected 16 kHz, got {sr}")
inputs = processor(audio, sampling_rate=16000, return_tensors="pt")
with torch.inference_mode():
logits = model(**inputs).logits
pred_ids = torch.argmax(logits, dim=-1)
text = processor.batch_decode(pred_ids)[0]
print(text)
Training
Augmentation included speed perturbation (details in experiment repo / paper).
Limitations
Model quality depends on recording conditions and vocabulary coverage; evaluate on your domain before production use.
- Downloads last month
- 24
Model tree for Selest/wav2vec2-bert_Assyrian_Urmi_ASR_model
Base model
facebook/w2v-bert-2.0