YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning

Hugging Face Model

A novel self-supervised speech representation learning model combining masked language modeling with self-distillation and online clustering techniques. Achieves SOTA performance on various speech processing tasks.

Table of Contents

Model Details

Developers

  • Alexander H. Liu, Heng-Jui Chang (MIT CSAIL)
  • Michael Auli, Wei-Ning Hsu (Meta AI)
  • James Glass (MIT CSAIL)

Model Type

Self-supervised speech representation learning (Wav2Vec2 architecture variant)

Key Features

  • Self-distillation with teacher-student framework
  • Dynamic online clustering
  • Contextualized masking strategy
  • Combined contrastive + diversity losses

Usage

Feature Extraction

from transformers import Wav2Vec2ForPreTraining, Wav2Vec2FeatureExtractor
import torch
import librosa

# Load model components
model = Wav2Vec2ForPreTraining.from_pretrained("MohammadJRanjbar/DinoSR")
processor = Wav2Vec2FeatureExtractor.from_pretrained("MohammadJRanjbar/DinoSR")

# Process audio
audio, sr = librosa.load("speech.wav", sr=16000)
inputs = processor(audio, return_tensors="pt", sampling_rate=16000)

# Extract representations
with torch.no_grad():
    outputs = model(**inputs)
    
speech_features = outputs.projected_states  # [batch_size, seq_len, 256]

Fine-tuning for ASR

from transformers import Wav2Vec2ForCTC

model = Wav2Vec2ForCTC.from_pretrained(
    "MohammadJRanjbar/DinoSR",
    attention_dropout=0.1,
    hidden_dropout=0.1,
    layerdrop=0.1,
    ctc_loss_reduction="mean"
)

# Freeze feature encoder
model.freeze_feature_encoder()

Citation

@article{liu2023dinosr,
  title={DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning},
  author={Liu, Alexander H and Chang, Heng-Jui and Auli, Michael and Hsu, Wei-Ning and Glass, James},
  journal={arXiv preprint arXiv:2305.10005},
  year={2023}
}

Additional Information

Resources

Contact

For questions and feedback:

This model card was generated using best practices from Model Card Creator

Downloads last month
9
Safetensors
Model size
95.8M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support