Wav2Vec2-BERT CTC (speed perturbation)

Fine-tuned Wav2Vec2BertForCTC on Neo-Aramaic (Assyrian / Syriac tradition) — Urmi (Urmia) ASR data. Hub language: aii (Assyrian Neo-Aramaic). Base checkpoint: facebook/w2v-bert-2.0.
Input: log-mel features via SeamlessM4TFeatureExtractor (see preprocessor_config.json), sampling rate 16000 Hz.

Веса в репозитории для выгрузки: model.safetensors (копия из best_model).

Usage

import torch
from transformers import Wav2Vec2BertForCTC, Wav2Vec2BertProcessor
import soundfile as sf

model_id = "<YOUR_HF_USERNAME>/<YOUR_REPO_NAME>"
processor = Wav2Vec2BertProcessor.from_pretrained(model_id)
model = Wav2Vec2BertForCTC.from_pretrained(model_id)

audio, sr = sf.read("audio.wav", dtype="float32")
if sr != 16000:
    raise ValueError(f"Expected 16 kHz, got {sr}")

inputs = processor(audio, sampling_rate=16000, return_tensors="pt")
with torch.inference_mode():
    logits = model(**inputs).logits
pred_ids = torch.argmax(logits, dim=-1)
text = processor.batch_decode(pred_ids)[0]
print(text)

Training

Augmentation included speed perturbation (details in experiment repo / paper).

Limitations

Model quality depends on recording conditions and vocabulary coverage; evaluate on your domain before production use.

Downloads last month: 24

Safetensors

Model size

0.6B params

Tensor type

F32

Model tree for Selest/wav2vec2-bert_Assyrian_Urmi_ASR_model

Base model

facebook/w2v-bert-2.0

Finetuned

(469)

this model