MMS Forced Alignment — ONNX

ONNX export of Meta's MMS_FA (Massively Multilingual Speech Forced Alignment) model for CTC forced alignment.

Files

File Size Description
mms_fa.onnx 3.2 MB ONNX model graph
mms_fa.onnx.data 1.2 GB External weight data (FP32)
tokenizer.json 311 B Token-to-ID mapping (29 tokens)

Usage

from huggingface_hub import hf_hub_download

model_path = hf_hub_download("xycld/lyric-align-mms-fa", "mms_fa.onnx")
data_path = hf_hub_download("xycld/lyric-align-mms-fa", "mms_fa.onnx.data")
tokenizer_path = hf_hub_download("xycld/lyric-align-mms-fa", "tokenizer.json")

Note: mms_fa.onnx.data must be in the same directory as mms_fa.onnx for ONNX Runtime to load correctly. hf_hub_download handles this automatically via its cache.

Model Details

  • Original model: MMS_FA by Meta Research
  • Architecture: wav2vec2-based CTC forced aligner (315M parameters)
  • Precision: FP32
  • Input: mono 16kHz waveform
  • Output: log-probability emission matrix [num_frames, 29 labels] at 50fps (20ms/frame)
  • ONNX opset: 18

Quantized Variants

Variant Size Compression Load Time Inference MAE Acc @50ms Acc @100ms Acc @200ms Status
FP32 (original) 1,207 MB 1.0x 989ms 424ms/line 34.7ms 86.2% 97.5% 99.4% Available
FP16 605 MB 2.0x 2,335ms 576ms/line 34.7ms 86.2% 97.5% 99.4% Not recommended
UINT8 303 MB 4.0x 412ms 262ms/line 34.9ms 86.2% 97.5% 99.4% Recommended

Benchmark: Chinese song "错位时空" (362 characters, 53 lines) on CPU.

UINT8 is the recommended variant — 75% smaller, 38% faster inference, with virtually no accuracy loss (MAE +0.2ms).

FP16 is not recommended for CPU inference (no native FP16 support, slower than FP32). INT8 (QInt8) is incompatible with some ONNX runtimes due to ConvInteger operator requirements.

Attribution

This is an ONNX format conversion of Meta's MMS forced alignment model, originally distributed via torchaudio.pipelines.MMS_FA.

Original work:

Vineel Pratap, Andros Tjandra, Bowen Shi, et al. "Scaling Speech Technology to 1,000+ Languages." 2023. https://arxiv.org/abs/2305.13516

License

The original model weights are released by Meta under the CC-BY-NC-4.0 license. This ONNX conversion inherits the same license. Non-commercial use only.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for xycld/lyric-align-mms-fa