facebook
/

pe-av-large

lematt1991 commited on Oct 4, 2025

Commit

4e78e1c

verified ·

1 Parent(s): 1c805ba

Upload README.md with huggingface_hub

Files changed (1) hide show

README.md CHANGED Viewed

@@ -1,3 +1,31 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+---
+# Perception Encoder Audio-Video
+## Model Summary
+Perception Encoder Audio-Video (PE-AV) is a family of state-of-the-art encoders for audio and video understanding trained via scaled contrastive learning, built on top of the [PE image/video encoder](https://arxiv.org/abs/2504.13181) (PE)
+The model is available in the following sizes:
+- [`pe-av-small`](https://huggingface.co/facebook/pe-av-small): 12 layers, 209M parameters
+- [`pe-av-base`](https://huggingface.co/facebook/pe-av-base): 16 layers, 396M parameters
+- [`pe-av-large`](https://huggingface.co/facebook/pe-av-large): 28L, 1.597B parameters
+For each size we additionally provide a version that samples a fixed 16-frames for the video branch for efficiency:
+- [`pe-av-small-16-frame`](https://huggingface.co/facebook/pe-av-small-16-frame): 12 layers, 209M parameters
+- [`pe-av-base-16-frame`](https://huggingface.co/facebook/pe-av-base-16-frame): 16 layers, 396M parameters
+- [`pe-av-large-16-frame`](https://huggingface.co/facebook/pe-av-large-16-frame): 28L, 1.597B parameters
+## Usage
+Install `transformers` starting from version v4.34.0
+```
+pip install 'transformers>=4.34.0'
+```