Add link to paper, project page, and other checkpoints
#4
by
nielsr
HF Staff
- opened
README.md
CHANGED
|
@@ -6,8 +6,9 @@ pipeline_tag: zero-shot-image-classification
|
|
| 6 |
|
| 7 |
# Model Details
|
| 8 |
|
| 9 |
-
[\\[π Tech Report\\]](https://
|
| 10 |
[\\[π Github\\]](https://github.com/facebookresearch/perception_models/)
|
|
|
|
| 11 |
|
| 12 |
Perception Encoder (PE) is a state-of-the-art encoder for image and video understanding trained via simple vision-language learning. It was introduced in "[Perception Encoder: The best visual embeddings
|
| 13 |
are not at the output of the network](https://ai.meta.com/research/publications/perception-encoder-the-best-visual-embeddings-are-not-at-the-output-of-the-network/)".
|
|
@@ -35,7 +36,8 @@ PE core curently comes in 3 sizes. PE core G is the main checkpoint, with L and
|
|
| 35 |
| | Text | 0.47B | 1280 | 24 | 5120 | 20 | 1280 | 72 tokens |
|
| 36 |
|
| 37 |
All PE core models use an attention pooling block with 8 heads on top of the vision tower. The L and B models _additionally_ have a class token for global aggregation. See the paper for more details.
|
| 38 |
-
|
|
|
|
| 39 |
|
| 40 |
|
| 41 |
#### Model Performance
|
|
|
|
| 6 |
|
| 7 |
# Model Details
|
| 8 |
|
| 9 |
+
[\\[π Tech Report\\]](https://huggingface.co/papers/2504.13180)
|
| 10 |
[\\[π Github\\]](https://github.com/facebookresearch/perception_models/)
|
| 11 |
+
[\\[π Project page\\]](https://ai.meta.com/datasets/plm-data/)
|
| 12 |
|
| 13 |
Perception Encoder (PE) is a state-of-the-art encoder for image and video understanding trained via simple vision-language learning. It was introduced in "[Perception Encoder: The best visual embeddings
|
| 14 |
are not at the output of the network](https://ai.meta.com/research/publications/perception-encoder-the-best-visual-embeddings-are-not-at-the-output-of-the-network/)".
|
|
|
|
| 36 |
| | Text | 0.47B | 1280 | 24 | 5120 | 20 | 1280 | 72 tokens |
|
| 37 |
|
| 38 |
All PE core models use an attention pooling block with 8 heads on top of the vision tower. The L and B models _additionally_ have a class token for global aggregation. See the paper for more details.
|
| 39 |
+
- B/16 model: [facebook/PE-Core-B16-224](https://huggingface.co/facebook/PE-Core-B16-224)
|
| 40 |
+
- L/14 model: [facebook/PE-Core-L14-336](https://huggingface.co/facebook/PE-Core-L14-336)
|
| 41 |
|
| 42 |
|
| 43 |
#### Model Performance
|