liperty-avhubert-encoder

Mirror of the AV-HuBERT large pretrained encoder (large_vox_iter5.pt, 3.91 GB), self-supervised on LRS3 + VoxCeleb2 by Meta AI.

Rehosted by the Liperty project for durability — the original Meta CDN URL could move or expire and AV-HuBERT remains the strongest publicly downloadable LRS3-pretrained visual encoder.

Original source

Meta CDN: https://dl.fbaipublicfiles.com/avhubert/model/lrs3_vox/clean-pretrain/large_vox_iter5.pt

Upstream repo: https://github.com/facebookresearch/av_hubert

Paper: Shi et al., "Learning Audio-Visual Speech Representation by Masked Multimodal Cluster Prediction," ICLR 2022.

Attribution

AV-HuBERT is licensed under the AV-HuBERT license, Copyright (c) Meta Platforms, Inc. All Rights Reserved.

License

Non-commercial research use only. The full license text is bundled here as LICENSE.txt. Read it before using these weights.

How Liperty uses this

Research-only as of 2026-05. See docs/AVHUBERT_V3_BACKEND.md in the Liperty repo for the V3 backend integration plan. Production deployment of Liperty currently uses the Auto-AVSR (V2) backend; AV-HuBERT (V3) is being evaluated as a potential successor.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support