facebook
/

PE-Core-B16-224

Add link to paper, project page, and other checkpoints

by nielsr HF Staff - opened Jun 28, 2025

←

Files changed (1) hide show

README.md CHANGED Viewed

@@ -6,8 +6,9 @@ pipeline_tag: zero-shot-image-classification
 # Model Details
-[\\[📃 Tech Report\\]](https://arxiv.org/abs/2504.13181)
 [\\[📂 Github\\]](https://github.com/facebookresearch/perception_models/)
 Perception Encoder (PE) is a state-of-the-art encoder for image and video understanding trained via simple vision-language learning. It was introduced in "[Perception Encoder: The best visual embeddings
 are not at the output of the network](https://ai.meta.com/research/publications/perception-encoder-the-best-visual-embeddings-are-not-at-the-output-of-the-network/)".
@@ -35,7 +36,8 @@ PE core curently comes in 3 sizes. PE core G is the main checkpoint, with L and
 |       | Text   | 0.47B  | 1280  | 24    | 5120 | 20    | 1280     | 72 tokens                 |
 All PE core models use an attention pooling block with 8 heads on top of the vision tower. The L and B models _additionally_ have a class token for global aggregation. See the paper for more details.
 #### Model Performance

 # Model Details
+[\\[📃 Tech Report\\]](https://huggingface.co/papers/2504.13180)
 [\\[📂 Github\\]](https://github.com/facebookresearch/perception_models/)
+[\\[🌐 Project page\\]](https://ai.meta.com/datasets/plm-data/)
 Perception Encoder (PE) is a state-of-the-art encoder for image and video understanding trained via simple vision-language learning. It was introduced in "[Perception Encoder: The best visual embeddings
 are not at the output of the network](https://ai.meta.com/research/publications/perception-encoder-the-best-visual-embeddings-are-not-at-the-output-of-the-network/)".
 |       | Text   | 0.47B  | 1280  | 24    | 5120 | 20    | 1280     | 72 tokens                 |
 All PE core models use an attention pooling block with 8 heads on top of the vision tower. The L and B models _additionally_ have a class token for global aggregation. See the paper for more details.
+- B/16 model: [facebook/PE-Core-B16-224](https://huggingface.co/facebook/PE-Core-B16-224)
+- L/14 model: [facebook/PE-Core-L14-336](https://huggingface.co/facebook/PE-Core-L14-336)
 #### Model Performance