Image Feature Extraction
timm
Safetensors
Transformers
rwightman commited on
Commit
1ca3e85
·
verified ·
1 Parent(s): bacb309

Add model

Browse files
Files changed (3) hide show
  1. README.md +160 -0
  2. config.json +36 -0
  3. model.safetensors +3 -0
README.md ADDED
@@ -0,0 +1,160 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - image-feature-extraction
4
+ - timm
5
+ - transformers
6
+ pipeline_tag: image-feature-extraction
7
+ library_name: timm
8
+ license: fair-noncommercial-research-license
9
+ datasets:
10
+ - lvd-1689m
11
+ ---
12
+ # Model card for convnext_small.eupe_lvd1689m
13
+
14
+ An EUPE ConvNeXt image feature encoder. Distilled on LVD-1689M using the Efficient Universal Perception Encoder method, from a proxy teacher distilled from multiple domain-expert foundation vision encoders.
15
+
16
+
17
+ ## Model Details
18
+ - **Model Type:** Image Feature Encoder
19
+ - **Model Stats:**
20
+ - Params (M): 49.5
21
+ - GMACs: 11.4
22
+ - Activations (M): 28.2
23
+ - Image size: 256 x 256
24
+ - **Original:** https://github.com/facebookresearch/EUPE
25
+ - **License:** [FAIR Noncommercial Research License](https://huggingface.co/facebook/fair-noncommercial-research-license/)
26
+ - **Dataset:** LVD-1689M
27
+ - **Papers:**
28
+ - Efficient Universal Perception Encoder: https://arxiv.org/abs/2603.22387
29
+ - A ConvNet for the 2020s: https://arxiv.org/abs/2201.03545
30
+ - PyTorch Image Models: https://github.com/huggingface/pytorch-image-models
31
+
32
+ ## Model Usage
33
+ ### Image Classification
34
+ ```python
35
+ from urllib.request import urlopen
36
+ from PIL import Image
37
+ import timm
38
+
39
+ img = Image.open(urlopen(
40
+ 'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
41
+ ))
42
+
43
+ model = timm.create_model('convnext_small.eupe_lvd1689m', pretrained=True)
44
+ model = model.eval()
45
+
46
+ # get model specific transforms (normalization, resize)
47
+ data_config = timm.data.resolve_model_data_config(model)
48
+ transforms = timm.data.create_transform(**data_config, is_training=False)
49
+
50
+ output = model(transforms(img).unsqueeze(0)) # unsqueeze single image into batch of 1
51
+
52
+ top5_probabilities, top5_class_indices = torch.topk(output.softmax(dim=1) * 100, k=5)
53
+ ```
54
+
55
+ ### Feature Map Extraction
56
+ ```python
57
+ from urllib.request import urlopen
58
+ from PIL import Image
59
+ import timm
60
+
61
+ img = Image.open(urlopen(
62
+ 'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
63
+ ))
64
+
65
+ model = timm.create_model(
66
+ 'convnext_small.eupe_lvd1689m',
67
+ pretrained=True,
68
+ features_only=True,
69
+ )
70
+ model = model.eval()
71
+
72
+ # get model specific transforms (normalization, resize)
73
+ data_config = timm.data.resolve_model_data_config(model)
74
+ transforms = timm.data.create_transform(**data_config, is_training=False)
75
+
76
+ output = model(transforms(img).unsqueeze(0)) # unsqueeze single image into batch of 1
77
+
78
+ for o in output:
79
+ # print shape of each feature map in output
80
+ # e.g.:
81
+ # torch.Size([1, 96, 64, 64])
82
+ # torch.Size([1, 192, 32, 32])
83
+ # torch.Size([1, 384, 16, 16])
84
+ # torch.Size([1, 768, 8, 8])
85
+
86
+ print(o.shape)
87
+ ```
88
+
89
+ ### Image Embeddings
90
+ ```python
91
+ from urllib.request import urlopen
92
+ from PIL import Image
93
+ import timm
94
+
95
+ img = Image.open(urlopen(
96
+ 'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
97
+ ))
98
+
99
+ model = timm.create_model(
100
+ 'convnext_small.eupe_lvd1689m',
101
+ pretrained=True,
102
+ num_classes=0, # remove classifier nn.Linear
103
+ )
104
+ model = model.eval()
105
+
106
+ # get model specific transforms (normalization, resize)
107
+ data_config = timm.data.resolve_model_data_config(model)
108
+ transforms = timm.data.create_transform(**data_config, is_training=False)
109
+
110
+ output = model(transforms(img).unsqueeze(0)) # output is (batch_size, num_features) shaped tensor
111
+
112
+ # or equivalently (without needing to set num_classes=0)
113
+
114
+ output = model.forward_features(transforms(img).unsqueeze(0))
115
+ # output is unpooled, a (1, 768, 8, 8) shaped tensor
116
+
117
+ output = model.forward_head(output, pre_logits=True)
118
+ # output is a (1, num_features) shaped tensor
119
+ ```
120
+
121
+ ## Model Comparison
122
+ See the associated paper for details on the evaluation protocols.
123
+
124
+ | Model | Params | TextVQA | SQA | Realworld | POPE | GQA | MMEp | SPair | NYUv2 | ADE20k |
125
+ |-------|--------|---------|-----|-----------|------|-----|------|-------|-------|--------|
126
+ | EUPE-ConvNeXt-T | 29M | 43.7 | 68.8 | 47.9 | 83.4 | 63.0 | 1278.1 | 41.3 | 0.430 | 43.5 |
127
+ | EUPE-ConvNeXt-S | 50M | 45.0 | 68.9 | 50.5 | 84.0 | 64.7 | 1284.2 | 40.1 | 0.388 | 46.8 |
128
+ | EUPE-ConvNeXt-B | 89M | 46.4 | 70.1 | 53.3 | 84.7 | 65.8 | 1348.9 | 37.7 | 0.365 | 48.9 |
129
+
130
+ ## Citation
131
+ ```bibtex
132
+ @misc{zhu2026eupe,
133
+ title={Efficient Universal Perception Encoder},
134
+ author={Zhu, Chenchen and Suri, Saksham and Jose, Cijo and Oquab, Maxime and Szafraniec, Marc and Wen, Wei and Xiong, Yunyang and Labatut, Patrick and Bojanowski, Piotr and Krishnamoorthi, Raghuraman and Chandra, Vikas},
135
+ year={2026},
136
+ eprint={2603.22387},
137
+ archivePrefix={arXiv},
138
+ primaryClass={cs.CV},
139
+ url={https://arxiv.org/abs/2603.22387},
140
+ }
141
+ ```
142
+ ```bibtex
143
+ @article{liu2022convnet,
144
+ author = {Zhuang Liu and Hanzi Mao and Chao-Yuan Wu and Christoph Feichtenhofer and Trevor Darrell and Saining Xie},
145
+ title = {A ConvNet for the 2020s},
146
+ journal = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
147
+ year = {2022},
148
+ }
149
+ ```
150
+ ```bibtex
151
+ @misc{rw2019timm,
152
+ author = {Ross Wightman},
153
+ title = {PyTorch Image Models},
154
+ year = {2019},
155
+ publisher = {GitHub},
156
+ journal = {GitHub repository},
157
+ doi = {10.5281/zenodo.4414861},
158
+ howpublished = {\url{https://github.com/huggingface/pytorch-image-models}}
159
+ }
160
+ ```
config.json ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architecture": "convnext_small",
3
+ "num_classes": 0,
4
+ "num_features": 768,
5
+ "pretrained_cfg": {
6
+ "tag": "eupe_lvd1689m",
7
+ "custom_load": false,
8
+ "input_size": [
9
+ 3,
10
+ 256,
11
+ 256
12
+ ],
13
+ "fixed_input_size": false,
14
+ "interpolation": "bicubic",
15
+ "crop_pct": 1.0,
16
+ "crop_mode": "center",
17
+ "mean": [
18
+ 0.485,
19
+ 0.456,
20
+ 0.406
21
+ ],
22
+ "std": [
23
+ 0.229,
24
+ 0.224,
25
+ 0.225
26
+ ],
27
+ "num_classes": 0,
28
+ "pool_size": [
29
+ 8,
30
+ 8
31
+ ],
32
+ "first_conv": "stem.0",
33
+ "classifier": "head.fc",
34
+ "license": "fair-noncommercial-research-license"
35
+ }
36
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:92f00ac2f509f0b7b6f74b158c26be72421288dd46236374821325d8d71f29bc
3
+ size 197852680