lematt1991 commited on
Commit
4e78e1c
·
verified ·
1 Parent(s): 1c805ba

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +31 -3
README.md CHANGED
@@ -1,3 +1,31 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ ---
3
+ license: apache-2.0
4
+ ---
5
+
6
+ # Perception Encoder Audio-Video
7
+
8
+ ## Model Summary
9
+
10
+ Perception Encoder Audio-Video (PE-AV) is a family of state-of-the-art encoders for audio and video understanding trained via scaled contrastive learning, built on top of the [PE image/video encoder](https://arxiv.org/abs/2504.13181) (PE)
11
+
12
+ The model is available in the following sizes:
13
+
14
+ - [`pe-av-small`](https://huggingface.co/facebook/pe-av-small): 12 layers, 209M parameters
15
+ - [`pe-av-base`](https://huggingface.co/facebook/pe-av-base): 16 layers, 396M parameters
16
+ - [`pe-av-large`](https://huggingface.co/facebook/pe-av-large): 28L, 1.597B parameters
17
+
18
+ For each size we additionally provide a version that samples a fixed 16-frames for the video branch for efficiency:
19
+
20
+ - [`pe-av-small-16-frame`](https://huggingface.co/facebook/pe-av-small-16-frame): 12 layers, 209M parameters
21
+ - [`pe-av-base-16-frame`](https://huggingface.co/facebook/pe-av-base-16-frame): 16 layers, 396M parameters
22
+ - [`pe-av-large-16-frame`](https://huggingface.co/facebook/pe-av-large-16-frame): 28L, 1.597B parameters
23
+
24
+
25
+ ## Usage
26
+
27
+ Install `transformers` starting from version v4.34.0
28
+
29
+ ```
30
+ pip install 'transformers>=4.34.0'
31
+ ```