AnnaZhang commited on
Commit
d16fe74
·
verified ·
1 Parent(s): 06848ed

Upload folder using huggingface_hub

Browse files
Files changed (4) hide show
  1. README.md +92 -3
  2. config.json +836 -0
  3. model.safetensors +3 -0
  4. preprocessor_config.json +26 -0
README.md CHANGED
@@ -1,3 +1,92 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - object-detection
5
+ - vision
6
+ datasets:
7
+ - coco
8
+ pipeline_tag: object-detection
9
+ library_name: transformers
10
+ ---
11
+
12
+ # LW-DETR (Light-Weight Detection Transformer)
13
+
14
+ LW-DETR, a Light-Weight DEtection TRansformer model, is designed to be a real-time object detection alternative that outperforms conventional convolutional (YOLO-style) and earlier transformer-based (DETR) methods in terms of speed and accuracy trade-off. It was introduced in the paper [LW-DETR: A Transformer Replacement to YOLO for Real-Time Detection](https://huggingface.co/papers/2406.03459) by Chen et al. and first released in this repository.
15
+ Disclaimer: This model was originally contributed by [stevenbucaille](https://huggingface.co/stevenbucaille) in 🤗 transformers.
16
+
17
+ ## Model description
18
+
19
+ LW-DETR is an end-to-end object detection model that uses a Vision Transformer (ViT) backbone as its encoder, a simple convolutional projector, and a shallow DETR decoder. The core philosophy is to leverage the power of transformers while implementing several efficiency-focused techniques to achieve real-time performance.
20
+
21
+ Key Architectural Details:
22
+ - ViT Encoder: Uses a plain ViT architecture. To reduce the quadratic complexity of global self-attention, it adopts interleaved window and global attentions.
23
+ - Window-Major Organization: It employs a highly efficient window-major feature map organization scheme for attention computation, which drastically reduces the costly memory permutation operations required when transitioning between global and window attention modes, leading to lower inference latency.
24
+ - Feature Aggregation: It aggregates features from multiple levels (intermediate and final layers) of the ViT encoder to create richer input for the decoder.
25
+ - Projector: A C2f block (from YOLOv8) connects the encoder and decoder. For larger versions (large/xlarge), it outputs two-scale features ($1/8$ and $1/32$) to the decoder.
26
+ - Shallow DETR Decoder: It uses a computationally efficient 3-layer transformer decoder (instead of the standard 6 layers), incorporating deformable cross-attention for faster convergence and lower latency.
27
+ - Object Queries: It uses a mixed-query selection scheme to form the object queries from both learnable content queries and generated spatial queries (based on top-K features from the Projector).
28
+
29
+ Training Details:
30
+ - IoU-aware Classification Loss (IA-BCE loss): Enhances the classification branch by incorporating IoU information into the target score $t=s^{\alpha}u^{1-\alpha}$.
31
+ - Group DETR: Uses a Group DETR strategy (13 parallel weight-sharing decoders) for faster training convergence without affecting inference speed.
32
+ - Pretraining: Uses a two-stage pretraining strategy: first, ViT is pretrained on Objects365 using a Masked Image Modeling (MIM) method (CAEv2), followed by supervised retraining of the encoder and training of the projector and decoder on Objects365. This provides a significant performance boost (average of $\approx 5.5\text{ mAP}$).
33
+
34
+ ### How to use
35
+
36
+ You can use the raw model for object detection. See the [model hub](https://huggingface.co/models?search=stevenbucaille/lw-detr) to look for all available LW DETR models.
37
+
38
+ Here is how to use this model:
39
+
40
+ ```python
41
+ from transformers import AutoImageProcessor, LwDetrForObjectDetection
42
+ import torch
43
+ from PIL import Image
44
+ import requests
45
+
46
+ url = "http://images.cocodataset.org/val2017/000000039769.jpg"
47
+ image = Image.open(requests.get(url, stream=True).raw)
48
+
49
+ processor = AutoImageProcessor.from_pretrained("stevenbucaille/lwdetr_medium_30e_objects365")
50
+ model = LwDetrForObjectDetection.from_pretrained("stevenbucaille/lwdetr_medium_30e_objects365")
51
+
52
+ inputs = processor(images=image, return_tensors="pt")
53
+ outputs = model(**inputs)
54
+
55
+ # convert outputs (bounding boxes and class logits) to COCO API
56
+ # let's only keep detections with score > 0.7
57
+ target_sizes = torch.tensor([image.size[::-1]])
58
+ results = processor.post_process_object_detection(outputs, target_sizes=target_sizes, threshold=0.7)[0]
59
+
60
+ for score, label, box in zip(results["scores"], results["labels"], results["boxes"]):
61
+ box = [round(i, 2) for i in box.tolist()]
62
+ print(
63
+ f"Detected {model.config.id2label[label.item()]} with confidence "
64
+ f"{round(score.item(), 3)} at location {box}"
65
+ )
66
+ ```
67
+ This should output:
68
+ ```
69
+ Detected Jug with confidence 0.942 at location [345.8, 23.79, 640.09, 371.73]
70
+ Detected Jug with confidence 0.911 at location [6.84, 55.37, 318.4, 474.02]
71
+ Detected Refrigerator with confidence 0.901 at location [41.0, 72.79, 175.38, 117.24]
72
+ Detected Refrigerator with confidence 0.788 at location [334.38, 76.9, 370.48, 187.82]
73
+ ```
74
+
75
+ Currently, both the feature extractor and model support PyTorch.
76
+
77
+ ## Training data
78
+
79
+ The LW-DETR models are trained/finetuned on the following datasets:
80
+ - Pretraining: Primarily conducted on [Objects365](https://www.objects365.org/overview.html), a large-scale, high-quality dataset for object detection.
81
+ - Finetuning: Final training is performed on the standard [COCO 2017 object detection dataset](https://cocodataset.org/#home).
82
+
83
+ ### BibTeX entry and citation info
84
+
85
+ ```bibtex
86
+ @article{chen2024lw,
87
+ title={LW-DETR: A Transformer Replacement to YOLO for Real-Time Detection},
88
+ author={Chen, Qiang and Su, Xiangbo and Zhang, Xinyu and Wang, Jian and Chen, Jiahui and Shen, Yunpeng and Han, Chuchu and Chen, Ziliang and Xu, Weixiang and Li, Fanrong and others},
89
+ journal={arXiv preprint arXiv:2406.03459},
90
+ year={2024}
91
+ }
92
+ ```
config.json ADDED
@@ -0,0 +1,836 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "activation_dropout": 0.0,
3
+ "activation_function": "silu",
4
+ "architectures": [
5
+ "LwDetrForObjectDetection"
6
+ ],
7
+ "attention_bias": true,
8
+ "attention_dropout": 0.0,
9
+ "auxiliary_loss": true,
10
+ "backbone": null,
11
+ "backbone_config": {
12
+ "cae_init_values": 0.1,
13
+ "dropout_prob": 0.0,
14
+ "hidden_act": "gelu",
15
+ "hidden_size": 384,
16
+ "image_size": 1024,
17
+ "initializer_range": 0.02,
18
+ "layer_norm_eps": 1e-06,
19
+ "mlp_ratio": 4,
20
+ "model_type": "lw_detr_vit",
21
+ "num_attention_heads": 12,
22
+ "num_channels": 3,
23
+ "num_hidden_layers": 10,
24
+ "num_windows": 16,
25
+ "num_windows_side": 4,
26
+ "out_features": [
27
+ "stage3",
28
+ "stage5",
29
+ "stage6",
30
+ "stage10"
31
+ ],
32
+ "out_indices": [
33
+ 3,
34
+ 5,
35
+ 6,
36
+ 10
37
+ ],
38
+ "patch_size": 16,
39
+ "pretrain_image_size": 224,
40
+ "qkv_bias": true,
41
+ "stage_names": [
42
+ "stem",
43
+ "stage1",
44
+ "stage2",
45
+ "stage3",
46
+ "stage4",
47
+ "stage5",
48
+ "stage6",
49
+ "stage7",
50
+ "stage8",
51
+ "stage9",
52
+ "stage10"
53
+ ],
54
+ "use_absolute_position_embeddings": true,
55
+ "window_block_indices": [
56
+ 0,
57
+ 1,
58
+ 3,
59
+ 6,
60
+ 7,
61
+ 9
62
+ ]
63
+ },
64
+ "backbone_kwargs": null,
65
+ "batch_norm_eps": 1e-05,
66
+ "bbox_cost": 5,
67
+ "bbox_loss_coefficient": 5,
68
+ "class_cost": 2,
69
+ "d_model": 256,
70
+ "decoder_activation_function": "relu",
71
+ "decoder_cross_attention_heads": 16,
72
+ "decoder_ffn_dim": 2048,
73
+ "decoder_layers": 3,
74
+ "decoder_n_points": 2,
75
+ "decoder_self_attention_heads": 8,
76
+ "dice_loss_coefficient": 1,
77
+ "disable_custom_kernels": true,
78
+ "dropout": 0.1,
79
+ "dtype": "float32",
80
+ "eos_coefficient": 0.1,
81
+ "focal_alpha": 0.25,
82
+ "giou_cost": 2,
83
+ "giou_loss_coefficient": 2,
84
+ "group_detr": 13,
85
+ "hidden_expansion": 0.5,
86
+ "id2label": {
87
+ "0": "Person",
88
+ "1": "Sneakers",
89
+ "10": "Cup",
90
+ "100": "Hanger",
91
+ "101": "Blackboard/Whiteboard",
92
+ "102": "Napkin",
93
+ "103": "Other Fish",
94
+ "104": "Orange/Tangerine",
95
+ "105": "Toiletry",
96
+ "106": "Keyboard",
97
+ "107": "Tomato",
98
+ "108": "Lantern",
99
+ "109": "Machinery Vehicle",
100
+ "11": "Street Lights",
101
+ "110": "Fan",
102
+ "111": "Green Vegetables",
103
+ "112": "Banana",
104
+ "113": "Baseball Glove",
105
+ "114": "Airplane",
106
+ "115": "Mouse",
107
+ "116": "Train",
108
+ "117": "Pumpkin",
109
+ "118": "Soccer",
110
+ "119": "Skiboard",
111
+ "12": "Cabinet/shelf",
112
+ "120": "Luggage",
113
+ "121": "Nightstand",
114
+ "122": "Tea pot",
115
+ "123": "Telephone",
116
+ "124": "Trolley",
117
+ "125": "Head Phone",
118
+ "126": "Sports Car",
119
+ "127": "Stop Sign",
120
+ "128": "Dessert",
121
+ "129": "Scooter",
122
+ "13": "Handbag/Satchel",
123
+ "130": "Stroller",
124
+ "131": "Crane",
125
+ "132": "Remote",
126
+ "133": "Refrigerator",
127
+ "134": "Oven",
128
+ "135": "Lemon",
129
+ "136": "Duck",
130
+ "137": "Baseball Bat",
131
+ "138": "Surveillance Camera",
132
+ "139": "Cat",
133
+ "14": "Bracelet",
134
+ "140": "Jug",
135
+ "141": "Broccoli",
136
+ "142": "Piano",
137
+ "143": "Pizza",
138
+ "144": "Elephant",
139
+ "145": "Skateboard",
140
+ "146": "Surfboard",
141
+ "147": "Gun",
142
+ "148": "Skating and Skiing shoes",
143
+ "149": "Gas stove",
144
+ "15": "Plate",
145
+ "150": "Donut",
146
+ "151": "Bow Tie",
147
+ "152": "Carrot",
148
+ "153": "Toilet",
149
+ "154": "Kite",
150
+ "155": "Strawberry",
151
+ "156": "Other Balls",
152
+ "157": "Shovel",
153
+ "158": "Pepper",
154
+ "159": "Computer Box",
155
+ "16": "Picture/Frame",
156
+ "160": "Toilet Paper",
157
+ "161": "Cleaning Products",
158
+ "162": "Chopsticks",
159
+ "163": "Microwave",
160
+ "164": "Pigeon",
161
+ "165": "Baseball",
162
+ "166": "Cutting/chopping Board",
163
+ "167": "Coffee Table",
164
+ "168": "Side Table",
165
+ "169": "Scissors",
166
+ "17": "Helmet",
167
+ "170": "Marker",
168
+ "171": "Pie",
169
+ "172": "Ladder",
170
+ "173": "Snowboard",
171
+ "174": "Cookies",
172
+ "175": "Radiator",
173
+ "176": "Fire Hydrant",
174
+ "177": "Basketball",
175
+ "178": "Zebra",
176
+ "179": "Grape",
177
+ "18": "Book",
178
+ "180": "Giraffe",
179
+ "181": "Potato",
180
+ "182": "Sausage",
181
+ "183": "Tricycle",
182
+ "184": "Violin",
183
+ "185": "Egg",
184
+ "186": "Fire Extinguisher",
185
+ "187": "Candy",
186
+ "188": "Fire Truck",
187
+ "189": "Billiards",
188
+ "19": "Gloves",
189
+ "190": "Converter",
190
+ "191": "Bathtub",
191
+ "192": "Wheelchair",
192
+ "193": "Golf Club",
193
+ "194": "Briefcase",
194
+ "195": "Cucumber",
195
+ "196": "Cigar/Cigarette",
196
+ "197": "Paint Brush",
197
+ "198": "Pear",
198
+ "199": "Heavy Truck",
199
+ "2": "Chair",
200
+ "20": "Storage box",
201
+ "200": "Hamburger",
202
+ "201": "Extractor",
203
+ "202": "Extension Cord",
204
+ "203": "Tong",
205
+ "204": "Tennis Racket",
206
+ "205": "Folder",
207
+ "206": "American Football",
208
+ "207": "earphone",
209
+ "208": "Mask",
210
+ "209": "Kettle",
211
+ "21": "Boat",
212
+ "210": "Tennis",
213
+ "211": "Ship",
214
+ "212": "Swing",
215
+ "213": "Coffee Machine",
216
+ "214": "Slide",
217
+ "215": "Carriage",
218
+ "216": "Onion",
219
+ "217": "Green beans",
220
+ "218": "Projector",
221
+ "219": "Frisbee",
222
+ "22": "Leather Shoes",
223
+ "220": "Washing Machine/Drying Machine",
224
+ "221": "Chicken",
225
+ "222": "Printer",
226
+ "223": "Watermelon",
227
+ "224": "Saxophone",
228
+ "225": "Tissue",
229
+ "226": "Toothbrush",
230
+ "227": "Ice cream",
231
+ "228": "Hot-air balloon",
232
+ "229": "Cello",
233
+ "23": "Flower",
234
+ "230": "French Fries",
235
+ "231": "Scale",
236
+ "232": "Trophy",
237
+ "233": "Cabbage",
238
+ "234": "Hot dog",
239
+ "235": "Blender",
240
+ "236": "Peach",
241
+ "237": "Rice",
242
+ "238": "Wallet/Purse",
243
+ "239": "Volleyball",
244
+ "24": "Bench",
245
+ "240": "Deer",
246
+ "241": "Goose",
247
+ "242": "Tape",
248
+ "243": "Tablet",
249
+ "244": "Cosmetics",
250
+ "245": "Trumpet",
251
+ "246": "Pineapple",
252
+ "247": "Golf Ball",
253
+ "248": "Ambulance",
254
+ "249": "Parking meter",
255
+ "25": "Potted Plant",
256
+ "250": "Mango",
257
+ "251": "Key",
258
+ "252": "Hurdle",
259
+ "253": "Fishing Rod",
260
+ "254": "Medal",
261
+ "255": "Flute",
262
+ "256": "Brush",
263
+ "257": "Penguin",
264
+ "258": "Megaphone",
265
+ "259": "Corn",
266
+ "26": "Bowl/Basin",
267
+ "260": "Lettuce",
268
+ "261": "Garlic",
269
+ "262": "Swan",
270
+ "263": "Helicopter",
271
+ "264": "Green Onion",
272
+ "265": "Sandwich",
273
+ "266": "Nuts",
274
+ "267": "Speed Limit Sign",
275
+ "268": "Induction Cooker",
276
+ "269": "Broom",
277
+ "27": "Flag",
278
+ "270": "Trombone",
279
+ "271": "Plum",
280
+ "272": "Rickshaw",
281
+ "273": "Goldfish",
282
+ "274": "Kiwi fruit",
283
+ "275": "Router/modem",
284
+ "276": "Poker Card",
285
+ "277": "Toaster",
286
+ "278": "Shrimp",
287
+ "279": "Sushi",
288
+ "28": "Pillow",
289
+ "280": "Cheese",
290
+ "281": "Notepaper",
291
+ "282": "Cherry",
292
+ "283": "Pliers",
293
+ "284": "CD",
294
+ "285": "Pasta",
295
+ "286": "Hammer",
296
+ "287": "Cue",
297
+ "288": "Avocado",
298
+ "289": "Hami melon",
299
+ "29": "Boots",
300
+ "290": "Flask",
301
+ "291": "Mushroom",
302
+ "292": "Screwdriver",
303
+ "293": "Soap",
304
+ "294": "Recorder",
305
+ "295": "Bear",
306
+ "296": "Eggplant",
307
+ "297": "Board Eraser",
308
+ "298": "Coconut",
309
+ "299": "Tape Measure/Ruler",
310
+ "3": "Other Shoes",
311
+ "30": "Vase",
312
+ "300": "Pig",
313
+ "301": "Showerhead",
314
+ "302": "Globe",
315
+ "303": "Chips",
316
+ "304": "Steak",
317
+ "305": "Crosswalk Sign",
318
+ "306": "Stapler",
319
+ "307": "Camel",
320
+ "308": "Formula 1",
321
+ "309": "Pomegranate",
322
+ "31": "Microphone",
323
+ "310": "Dishwasher",
324
+ "311": "Crab",
325
+ "312": "Hoverboard",
326
+ "313": "Meatball",
327
+ "314": "Rice Cooker",
328
+ "315": "Tuba",
329
+ "316": "Calculator",
330
+ "317": "Papaya",
331
+ "318": "Antelope",
332
+ "319": "Parrot",
333
+ "32": "Necklace",
334
+ "320": "Seal",
335
+ "321": "Butterfly",
336
+ "322": "Dumbbell",
337
+ "323": "Donkey",
338
+ "324": "Lion",
339
+ "325": "Urinal",
340
+ "326": "Dolphin",
341
+ "327": "Electric Drill",
342
+ "328": "Hair Dryer",
343
+ "329": "Egg tart",
344
+ "33": "Ring",
345
+ "330": "Jellyfish",
346
+ "331": "Treadmill",
347
+ "332": "Lighter",
348
+ "333": "Grapefruit",
349
+ "334": "Game board",
350
+ "335": "Mop",
351
+ "336": "Radish",
352
+ "337": "Baozi",
353
+ "338": "Target",
354
+ "339": "French",
355
+ "34": "SUV",
356
+ "340": "Spring Rolls",
357
+ "341": "Monkey",
358
+ "342": "Rabbit",
359
+ "343": "Pencil Case",
360
+ "344": "Yak",
361
+ "345": "Red Cabbage",
362
+ "346": "Binoculars",
363
+ "347": "Asparagus",
364
+ "348": "Barbell",
365
+ "349": "Scallop",
366
+ "35": "Wine Glass",
367
+ "350": "Noddles",
368
+ "351": "Comb",
369
+ "352": "Dumpling",
370
+ "353": "Oyster",
371
+ "354": "Table Tennis paddle",
372
+ "355": "Cosmetics Brush/Eyeliner Pencil",
373
+ "356": "Chainsaw",
374
+ "357": "Eraser",
375
+ "358": "Lobster",
376
+ "359": "Durian",
377
+ "36": "Belt",
378
+ "360": "Okra",
379
+ "361": "Lipstick",
380
+ "362": "Cosmetics Mirror",
381
+ "363": "Curling",
382
+ "364": "Table Tennis",
383
+ "365": "N/A",
384
+ "37": "Monitor/TV",
385
+ "38": "Backpack",
386
+ "39": "Umbrella",
387
+ "4": "Hat",
388
+ "40": "Traffic Light",
389
+ "41": "Speaker",
390
+ "42": "Watch",
391
+ "43": "Tie",
392
+ "44": "Trash bin Can",
393
+ "45": "Slippers",
394
+ "46": "Bicycle",
395
+ "47": "Stool",
396
+ "48": "Barrel/bucket",
397
+ "49": "Van",
398
+ "5": "Car",
399
+ "50": "Couch",
400
+ "51": "Sandals",
401
+ "52": "Basket",
402
+ "53": "Drum",
403
+ "54": "Pen/Pencil",
404
+ "55": "Bus",
405
+ "56": "Wild Bird",
406
+ "57": "High Heels",
407
+ "58": "Motorcycle",
408
+ "59": "Guitar",
409
+ "6": "Lamp",
410
+ "60": "Carpet",
411
+ "61": "Cell Phone",
412
+ "62": "Bread",
413
+ "63": "Camera",
414
+ "64": "Canned",
415
+ "65": "Truck",
416
+ "66": "Traffic cone",
417
+ "67": "Cymbal",
418
+ "68": "Lifesaver",
419
+ "69": "Towel",
420
+ "7": "Glasses",
421
+ "70": "Stuffed Toy",
422
+ "71": "Candle",
423
+ "72": "Sailboat",
424
+ "73": "Laptop",
425
+ "74": "Awning",
426
+ "75": "Bed",
427
+ "76": "Faucet",
428
+ "77": "Tent",
429
+ "78": "Horse",
430
+ "79": "Mirror",
431
+ "8": "Bottle",
432
+ "80": "Power outlet",
433
+ "81": "Sink",
434
+ "82": "Apple",
435
+ "83": "Air Conditioner",
436
+ "84": "Knife",
437
+ "85": "Hockey Stick",
438
+ "86": "Paddle",
439
+ "87": "Pickup Truck",
440
+ "88": "Fork",
441
+ "89": "Traffic Sign",
442
+ "9": "Desk",
443
+ "90": "Balloon",
444
+ "91": "Tripod",
445
+ "92": "Dog",
446
+ "93": "Spoon",
447
+ "94": "Clock",
448
+ "95": "Pot",
449
+ "96": "Cow",
450
+ "97": "Cake",
451
+ "98": "Dining Table",
452
+ "99": "Sheep"
453
+ },
454
+ "init_std": 0.02,
455
+ "label2id": {
456
+ "Air Conditioner": 83,
457
+ "Airplane": 114,
458
+ "Ambulance": 248,
459
+ "American Football": 206,
460
+ "Antelope": 318,
461
+ "Apple": 82,
462
+ "Asparagus": 347,
463
+ "Avocado": 288,
464
+ "Awning": 74,
465
+ "Backpack": 38,
466
+ "Balloon": 90,
467
+ "Banana": 112,
468
+ "Baozi": 337,
469
+ "Barbell": 348,
470
+ "Barrel/bucket": 48,
471
+ "Baseball": 165,
472
+ "Baseball Bat": 137,
473
+ "Baseball Glove": 113,
474
+ "Basket": 52,
475
+ "Basketball": 177,
476
+ "Bathtub": 191,
477
+ "Bear": 295,
478
+ "Bed": 75,
479
+ "Belt": 36,
480
+ "Bench": 24,
481
+ "Bicycle": 46,
482
+ "Billiards": 189,
483
+ "Binoculars": 346,
484
+ "Blackboard/Whiteboard": 101,
485
+ "Blender": 235,
486
+ "Board Eraser": 297,
487
+ "Boat": 21,
488
+ "Book": 18,
489
+ "Boots": 29,
490
+ "Bottle": 8,
491
+ "Bow Tie": 151,
492
+ "Bowl/Basin": 26,
493
+ "Bracelet": 14,
494
+ "Bread": 62,
495
+ "Briefcase": 194,
496
+ "Broccoli": 141,
497
+ "Broom": 269,
498
+ "Brush": 256,
499
+ "Bus": 55,
500
+ "Butterfly": 321,
501
+ "CD": 284,
502
+ "Cabbage": 233,
503
+ "Cabinet/shelf": 12,
504
+ "Cake": 97,
505
+ "Calculator": 316,
506
+ "Camel": 307,
507
+ "Camera": 63,
508
+ "Candle": 71,
509
+ "Candy": 187,
510
+ "Canned": 64,
511
+ "Car": 5,
512
+ "Carpet": 60,
513
+ "Carriage": 215,
514
+ "Carrot": 152,
515
+ "Cat": 139,
516
+ "Cell Phone": 61,
517
+ "Cello": 229,
518
+ "Chainsaw": 356,
519
+ "Chair": 2,
520
+ "Cheese": 280,
521
+ "Cherry": 282,
522
+ "Chicken": 221,
523
+ "Chips": 303,
524
+ "Chopsticks": 162,
525
+ "Cigar/Cigarette": 196,
526
+ "Cleaning Products": 161,
527
+ "Clock": 94,
528
+ "Coconut": 298,
529
+ "Coffee Machine": 213,
530
+ "Coffee Table": 167,
531
+ "Comb": 351,
532
+ "Computer Box": 159,
533
+ "Converter": 190,
534
+ "Cookies": 174,
535
+ "Corn": 259,
536
+ "Cosmetics": 244,
537
+ "Cosmetics Brush/Eyeliner Pencil": 355,
538
+ "Cosmetics Mirror": 362,
539
+ "Couch": 50,
540
+ "Cow": 96,
541
+ "Crab": 311,
542
+ "Crane": 131,
543
+ "Crosswalk Sign": 305,
544
+ "Cucumber": 195,
545
+ "Cue": 287,
546
+ "Cup": 10,
547
+ "Curling": 363,
548
+ "Cutting/chopping Board": 166,
549
+ "Cymbal": 67,
550
+ "Deer": 240,
551
+ "Desk": 9,
552
+ "Dessert": 128,
553
+ "Dining Table": 98,
554
+ "Dishwasher": 310,
555
+ "Dog": 92,
556
+ "Dolphin": 326,
557
+ "Donkey": 323,
558
+ "Donut": 150,
559
+ "Drum": 53,
560
+ "Duck": 136,
561
+ "Dumbbell": 322,
562
+ "Dumpling": 352,
563
+ "Durian": 359,
564
+ "Egg": 185,
565
+ "Egg tart": 329,
566
+ "Eggplant": 296,
567
+ "Electric Drill": 327,
568
+ "Elephant": 144,
569
+ "Eraser": 357,
570
+ "Extension Cord": 202,
571
+ "Extractor": 201,
572
+ "Fan": 110,
573
+ "Faucet": 76,
574
+ "Fire Extinguisher": 186,
575
+ "Fire Hydrant": 176,
576
+ "Fire Truck": 188,
577
+ "Fishing Rod": 253,
578
+ "Flag": 27,
579
+ "Flask": 290,
580
+ "Flower": 23,
581
+ "Flute": 255,
582
+ "Folder": 205,
583
+ "Fork": 88,
584
+ "Formula 1": 308,
585
+ "French": 339,
586
+ "French Fries": 230,
587
+ "Frisbee": 219,
588
+ "Game board": 334,
589
+ "Garlic": 261,
590
+ "Gas stove": 149,
591
+ "Giraffe": 180,
592
+ "Glasses": 7,
593
+ "Globe": 302,
594
+ "Gloves": 19,
595
+ "Goldfish": 273,
596
+ "Golf Ball": 247,
597
+ "Golf Club": 193,
598
+ "Goose": 241,
599
+ "Grape": 179,
600
+ "Grapefruit": 333,
601
+ "Green Onion": 264,
602
+ "Green Vegetables": 111,
603
+ "Green beans": 217,
604
+ "Guitar": 59,
605
+ "Gun": 147,
606
+ "Hair Dryer": 328,
607
+ "Hamburger": 200,
608
+ "Hami melon": 289,
609
+ "Hammer": 286,
610
+ "Handbag/Satchel": 13,
611
+ "Hanger": 100,
612
+ "Hat": 4,
613
+ "Head Phone": 125,
614
+ "Heavy Truck": 199,
615
+ "Helicopter": 263,
616
+ "Helmet": 17,
617
+ "High Heels": 57,
618
+ "Hockey Stick": 85,
619
+ "Horse": 78,
620
+ "Hot dog": 234,
621
+ "Hot-air balloon": 228,
622
+ "Hoverboard": 312,
623
+ "Hurdle": 252,
624
+ "Ice cream": 227,
625
+ "Induction Cooker": 268,
626
+ "Jellyfish": 330,
627
+ "Jug": 140,
628
+ "Kettle": 209,
629
+ "Key": 251,
630
+ "Keyboard": 106,
631
+ "Kite": 154,
632
+ "Kiwi fruit": 274,
633
+ "Knife": 84,
634
+ "Ladder": 172,
635
+ "Lamp": 6,
636
+ "Lantern": 108,
637
+ "Laptop": 73,
638
+ "Leather Shoes": 22,
639
+ "Lemon": 135,
640
+ "Lettuce": 260,
641
+ "Lifesaver": 68,
642
+ "Lighter": 332,
643
+ "Lion": 324,
644
+ "Lipstick": 361,
645
+ "Lobster": 358,
646
+ "Luggage": 120,
647
+ "Machinery Vehicle": 109,
648
+ "Mango": 250,
649
+ "Marker": 170,
650
+ "Mask": 208,
651
+ "Meatball": 313,
652
+ "Medal": 254,
653
+ "Megaphone": 258,
654
+ "Microphone": 31,
655
+ "Microwave": 163,
656
+ "Mirror": 79,
657
+ "Monitor/TV": 37,
658
+ "Monkey": 341,
659
+ "Mop": 335,
660
+ "Motorcycle": 58,
661
+ "Mouse": 115,
662
+ "Mushroom": 291,
663
+ "N/A": 365,
664
+ "Napkin": 102,
665
+ "Necklace": 32,
666
+ "Nightstand": 121,
667
+ "Noddles": 350,
668
+ "Notepaper": 281,
669
+ "Nuts": 266,
670
+ "Okra": 360,
671
+ "Onion": 216,
672
+ "Orange/Tangerine": 104,
673
+ "Other Balls": 156,
674
+ "Other Fish": 103,
675
+ "Other Shoes": 3,
676
+ "Oven": 134,
677
+ "Oyster": 353,
678
+ "Paddle": 86,
679
+ "Paint Brush": 197,
680
+ "Papaya": 317,
681
+ "Parking meter": 249,
682
+ "Parrot": 319,
683
+ "Pasta": 285,
684
+ "Peach": 236,
685
+ "Pear": 198,
686
+ "Pen/Pencil": 54,
687
+ "Pencil Case": 343,
688
+ "Penguin": 257,
689
+ "Pepper": 158,
690
+ "Person": 0,
691
+ "Piano": 142,
692
+ "Pickup Truck": 87,
693
+ "Picture/Frame": 16,
694
+ "Pie": 171,
695
+ "Pig": 300,
696
+ "Pigeon": 164,
697
+ "Pillow": 28,
698
+ "Pineapple": 246,
699
+ "Pizza": 143,
700
+ "Plate": 15,
701
+ "Pliers": 283,
702
+ "Plum": 271,
703
+ "Poker Card": 276,
704
+ "Pomegranate": 309,
705
+ "Pot": 95,
706
+ "Potato": 181,
707
+ "Potted Plant": 25,
708
+ "Power outlet": 80,
709
+ "Printer": 222,
710
+ "Projector": 218,
711
+ "Pumpkin": 117,
712
+ "Rabbit": 342,
713
+ "Radiator": 175,
714
+ "Radish": 336,
715
+ "Recorder": 294,
716
+ "Red Cabbage": 345,
717
+ "Refrigerator": 133,
718
+ "Remote": 132,
719
+ "Rice": 237,
720
+ "Rice Cooker": 314,
721
+ "Rickshaw": 272,
722
+ "Ring": 33,
723
+ "Router/modem": 275,
724
+ "SUV": 34,
725
+ "Sailboat": 72,
726
+ "Sandals": 51,
727
+ "Sandwich": 265,
728
+ "Sausage": 182,
729
+ "Saxophone": 224,
730
+ "Scale": 231,
731
+ "Scallop": 349,
732
+ "Scissors": 169,
733
+ "Scooter": 129,
734
+ "Screwdriver": 292,
735
+ "Seal": 320,
736
+ "Sheep": 99,
737
+ "Ship": 211,
738
+ "Shovel": 157,
739
+ "Showerhead": 301,
740
+ "Shrimp": 278,
741
+ "Side Table": 168,
742
+ "Sink": 81,
743
+ "Skateboard": 145,
744
+ "Skating and Skiing shoes": 148,
745
+ "Skiboard": 119,
746
+ "Slide": 214,
747
+ "Slippers": 45,
748
+ "Sneakers": 1,
749
+ "Snowboard": 173,
750
+ "Soap": 293,
751
+ "Soccer": 118,
752
+ "Speaker": 41,
753
+ "Speed Limit Sign": 267,
754
+ "Spoon": 93,
755
+ "Sports Car": 126,
756
+ "Spring Rolls": 340,
757
+ "Stapler": 306,
758
+ "Steak": 304,
759
+ "Stool": 47,
760
+ "Stop Sign": 127,
761
+ "Storage box": 20,
762
+ "Strawberry": 155,
763
+ "Street Lights": 11,
764
+ "Stroller": 130,
765
+ "Stuffed Toy": 70,
766
+ "Surfboard": 146,
767
+ "Surveillance Camera": 138,
768
+ "Sushi": 279,
769
+ "Swan": 262,
770
+ "Swing": 212,
771
+ "Table Tennis": 364,
772
+ "Table Tennis paddle": 354,
773
+ "Tablet": 243,
774
+ "Tape": 242,
775
+ "Tape Measure/Ruler": 299,
776
+ "Target": 338,
777
+ "Tea pot": 122,
778
+ "Telephone": 123,
779
+ "Tennis": 210,
780
+ "Tennis Racket": 204,
781
+ "Tent": 77,
782
+ "Tie": 43,
783
+ "Tissue": 225,
784
+ "Toaster": 277,
785
+ "Toilet": 153,
786
+ "Toilet Paper": 160,
787
+ "Toiletry": 105,
788
+ "Tomato": 107,
789
+ "Tong": 203,
790
+ "Toothbrush": 226,
791
+ "Towel": 69,
792
+ "Traffic Light": 40,
793
+ "Traffic Sign": 89,
794
+ "Traffic cone": 66,
795
+ "Train": 116,
796
+ "Trash bin Can": 44,
797
+ "Treadmill": 331,
798
+ "Tricycle": 183,
799
+ "Tripod": 91,
800
+ "Trolley": 124,
801
+ "Trombone": 270,
802
+ "Trophy": 232,
803
+ "Truck": 65,
804
+ "Trumpet": 245,
805
+ "Tuba": 315,
806
+ "Umbrella": 39,
807
+ "Urinal": 325,
808
+ "Van": 49,
809
+ "Vase": 30,
810
+ "Violin": 184,
811
+ "Volleyball": 239,
812
+ "Wallet/Purse": 238,
813
+ "Washing Machine/Drying Machine": 220,
814
+ "Watch": 42,
815
+ "Watermelon": 223,
816
+ "Wheelchair": 192,
817
+ "Wild Bird": 56,
818
+ "Wine Glass": 35,
819
+ "Yak": 344,
820
+ "Zebra": 178,
821
+ "earphone": 207
822
+ },
823
+ "model_type": "lw_detr",
824
+ "num_feature_levels": 1,
825
+ "num_queries": 300,
826
+ "projector_in_channels": [
827
+ 256
828
+ ],
829
+ "projector_out_channels": 256,
830
+ "projector_scale_factors": [
831
+ 1.0
832
+ ],
833
+ "transformers_version": "5.0.0.dev0",
834
+ "use_pretrained_backbone": false,
835
+ "use_timm_backbone": false
836
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7de07cea05915e84d8505e7412389d7cf94faa24f4c5a4ae12cc299581998418
3
+ size 116974744
preprocessor_config.json ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "do_convert_annotations": true,
3
+ "do_normalize": true,
4
+ "do_pad": true,
5
+ "do_rescale": true,
6
+ "do_resize": true,
7
+ "format": "coco_detection",
8
+ "image_mean": [
9
+ 0.485,
10
+ 0.456,
11
+ 0.406
12
+ ],
13
+ "image_processor_type": "DeformableDetrImageProcessor",
14
+ "image_std": [
15
+ 0.229,
16
+ 0.224,
17
+ 0.225
18
+ ],
19
+ "pad_size": null,
20
+ "resample": 2,
21
+ "rescale_factor": 0.00392156862745098,
22
+ "size": {
23
+ "height": 640,
24
+ "width": 640
25
+ }
26
+ }