liperty-avhubert-encoder
Mirror of the AV-HuBERT large pretrained encoder (large_vox_iter5.pt, 3.91 GB), self-supervised on LRS3 + VoxCeleb2 by Meta AI.
Rehosted by the Liperty project for durability — the original Meta CDN URL could move or expire and AV-HuBERT remains the strongest publicly downloadable LRS3-pretrained visual encoder.
Original source
Meta CDN: https://dl.fbaipublicfiles.com/avhubert/model/lrs3_vox/clean-pretrain/large_vox_iter5.pt
Upstream repo: https://github.com/facebookresearch/av_hubert
Paper: Shi et al., "Learning Audio-Visual Speech Representation by Masked Multimodal Cluster Prediction," ICLR 2022.
Attribution
AV-HuBERT is licensed under the AV-HuBERT license, Copyright (c) Meta Platforms, Inc. All Rights Reserved.
License
Non-commercial research use only. The full license text is bundled here as LICENSE.txt. Read it before using these weights.
How Liperty uses this
Research-only as of 2026-05. See docs/AVHUBERT_V3_BACKEND.md in the Liperty repo for the V3 backend integration plan. Production deployment of Liperty currently uses the Auto-AVSR (V2) backend; AV-HuBERT (V3) is being evaluated as a potential successor.