Add link to paper, project page, and other checkpoints

#4
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +4 -2
README.md CHANGED
@@ -6,8 +6,9 @@ pipeline_tag: zero-shot-image-classification
6
 
7
  # Model Details
8
 
9
- [\\[πŸ“ƒ Tech Report\\]](https://arxiv.org/abs/2504.13181)
10
  [\\[πŸ“‚ Github\\]](https://github.com/facebookresearch/perception_models/)
 
11
 
12
  Perception Encoder (PE) is a state-of-the-art encoder for image and video understanding trained via simple vision-language learning. It was introduced in "[Perception Encoder: The best visual embeddings
13
  are not at the output of the network](https://ai.meta.com/research/publications/perception-encoder-the-best-visual-embeddings-are-not-at-the-output-of-the-network/)".
@@ -35,7 +36,8 @@ PE core curently comes in 3 sizes. PE core G is the main checkpoint, with L and
35
  | | Text | 0.47B | 1280 | 24 | 5120 | 20 | 1280 | 72 tokens |
36
 
37
  All PE core models use an attention pooling block with 8 heads on top of the vision tower. The L and B models _additionally_ have a class token for global aggregation. See the paper for more details.
38
-
 
39
 
40
 
41
  #### Model Performance
 
6
 
7
  # Model Details
8
 
9
+ [\\[πŸ“ƒ Tech Report\\]](https://huggingface.co/papers/2504.13180)
10
  [\\[πŸ“‚ Github\\]](https://github.com/facebookresearch/perception_models/)
11
+ [\\[🌐 Project page\\]](https://ai.meta.com/datasets/plm-data/)
12
 
13
  Perception Encoder (PE) is a state-of-the-art encoder for image and video understanding trained via simple vision-language learning. It was introduced in "[Perception Encoder: The best visual embeddings
14
  are not at the output of the network](https://ai.meta.com/research/publications/perception-encoder-the-best-visual-embeddings-are-not-at-the-output-of-the-network/)".
 
36
  | | Text | 0.47B | 1280 | 24 | 5120 | 20 | 1280 | 72 tokens |
37
 
38
  All PE core models use an attention pooling block with 8 heads on top of the vision tower. The L and B models _additionally_ have a class token for global aggregation. See the paper for more details.
39
+ - B/16 model: [facebook/PE-Core-B16-224](https://huggingface.co/facebook/PE-Core-B16-224)
40
+ - L/14 model: [facebook/PE-Core-L14-336](https://huggingface.co/facebook/PE-Core-L14-336)
41
 
42
 
43
  #### Model Performance