Instructions to use gwenn-ha-dev/wavlm-large-mlx with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use gwenn-ha-dev/wavlm-large-mlx with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir wavlm-large-mlx gwenn-ha-dev/wavlm-large-mlx
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
WavLM-Large for MLX-Swift
Pre-converted weights of microsoft/wavlm-large
for the WavLMLargeMLX Swift package.
Looking for the matching vocoder? See
gwenn-ha-dev/hifigan-wavlm-mlx— the companionbshall/knn-vcHiFi-GAN port. Together they implement the full kNN-VC pipeline (SSL features → audio) in pure Swift, no Python at runtime.
- Format: safetensors, float32
- Size: 1.26 GB, 487 entries, 315.45 M parameters
- SHA-256:
21867dd6b9755b970676199c88a50b72010fae8b1a87467aa6977825aea9b2b3
What's in this repo
| File | Purpose |
|---|---|
wavlm-large.safetensors |
Pretrained weights for WavLM-Large. |
fixtures/wavlm/*.safetensors |
Reference forward outputs (sine / noise / mix) for parity testing. |
fixtures/wavlm/*.wav |
The corresponding test waveforms (16 kHz mono). |
Difference vs upstream microsoft/wavlm-large
The upstream HF state dict has 488 entries; this one has 487. The
single difference: pos_conv_embed.conv ships with weight_norm
materialized. Instead of weight_g (norm scalar) + weight_v
(direction tensor), we store a single fused
weight = weight_g · weight_v / ‖weight_v‖. This is mathematically
equivalent at inference (no numerical change) and lets MLX-Swift use a
plain Conv1d since it doesn't ship a weight_norm parametrization.
The materialization is a one-line PyTorch op:
from torch.nn.utils.parametrize import remove_parametrizations
remove_parametrizations(
model.encoder.pos_conv_embed.conv, "weight", leave_parametrized=True
)
Numerical parity vs PyTorch reference
Max absolute difference, float32, on three deterministic fixtures:
| Stage | sine_440_1s | noise_2s | mixed_3s |
|---|---|---|---|
| Feature Extractor | 2.6e-7 | 3.6e-7 | 3.6e-7 |
| Layer 6 | 3.2e-4 | 1.7e-4 | 1.7e-4 |
| Layer 24 (post final-LN) | 9.2e-5 | 6.1e-5 | 7.8e-5 |
Usage
See the WavLMLargeMLX Swift package for full integration. Three lines:
let arrays = try SafetensorsLoader.load(url: url)
let model = WavLM()
try model.loadWeights(from: arrays)
To download from this repo:
huggingface-cli download gwenn-ha-dev/wavlm-large-mlx \
wavlm-large.safetensors --local-dir ./weights
License
Code (the Swift package): MIT.
Weights derive from microsoft/wavlm-large — refer to the
upstream model card for
the model's own terms.
Citation
@article{Chen2022WavLM,
title={WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing},
author={Chen, Sanyuan and Wang, Chengyi and Chen, Zhengyang and Wu, Yu and Liu, Shujie and Chen, Zhuo and Li, Jinyu and Kanda, Naoyuki and Yoshioka, Takuya and Xiao, Xiong and Wu, Jian and Zhou, Long and Ren, Shuo and Qian, Yanmin and Qian, Yao and Zeng, Michael and Wei, Furu},
journal={IEEE Journal of Selected Topics in Signal Processing},
year={2022}
}
Quantized
Model tree for gwenn-ha-dev/wavlm-large-mlx
Base model
microsoft/wavlm-large