Wi11Chan commited on
Commit
c95b30b
·
1 Parent(s): 34ca0a6

Upload ViTForSemanticSegmentation

Browse files
Files changed (4) hide show
  1. config.json +2031 -0
  2. configuration_vit.py +144 -0
  3. modeling_vit.py +960 -0
  4. pytorch_model.bin +3 -0
config.json ADDED
@@ -0,0 +1,2031 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "google/vit-base-patch16-224",
3
+ "architectures": [
4
+ "ViTForSemanticSegmentation"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.0,
7
+ "auto_map": {
8
+ "AutoConfig": "configuration_vit.ViTConfig",
9
+ "AutoModelForSemanticSegmentation": "modeling_vit.ViTForSemanticSegmentation"
10
+ },
11
+ "encoder_stride": 16,
12
+ "hidden_act": "gelu",
13
+ "hidden_dropout_prob": 0.0,
14
+ "hidden_size": 768,
15
+ "id2label": {
16
+ "0": "tench, Tinca tinca",
17
+ "1": "goldfish, Carassius auratus",
18
+ "2": "great white shark, white shark, man-eater, man-eating shark, Carcharodon carcharias",
19
+ "3": "tiger shark, Galeocerdo cuvieri",
20
+ "4": "hammerhead, hammerhead shark",
21
+ "5": "electric ray, crampfish, numbfish, torpedo",
22
+ "6": "stingray",
23
+ "7": "cock",
24
+ "8": "hen",
25
+ "9": "ostrich, Struthio camelus",
26
+ "10": "brambling, Fringilla montifringilla",
27
+ "11": "goldfinch, Carduelis carduelis",
28
+ "12": "house finch, linnet, Carpodacus mexicanus",
29
+ "13": "junco, snowbird",
30
+ "14": "indigo bunting, indigo finch, indigo bird, Passerina cyanea",
31
+ "15": "robin, American robin, Turdus migratorius",
32
+ "16": "bulbul",
33
+ "17": "jay",
34
+ "18": "magpie",
35
+ "19": "chickadee",
36
+ "20": "water ouzel, dipper",
37
+ "21": "kite",
38
+ "22": "bald eagle, American eagle, Haliaeetus leucocephalus",
39
+ "23": "vulture",
40
+ "24": "great grey owl, great gray owl, Strix nebulosa",
41
+ "25": "European fire salamander, Salamandra salamandra",
42
+ "26": "common newt, Triturus vulgaris",
43
+ "27": "eft",
44
+ "28": "spotted salamander, Ambystoma maculatum",
45
+ "29": "axolotl, mud puppy, Ambystoma mexicanum",
46
+ "30": "bullfrog, Rana catesbeiana",
47
+ "31": "tree frog, tree-frog",
48
+ "32": "tailed frog, bell toad, ribbed toad, tailed toad, Ascaphus trui",
49
+ "33": "loggerhead, loggerhead turtle, Caretta caretta",
50
+ "34": "leatherback turtle, leatherback, leathery turtle, Dermochelys coriacea",
51
+ "35": "mud turtle",
52
+ "36": "terrapin",
53
+ "37": "box turtle, box tortoise",
54
+ "38": "banded gecko",
55
+ "39": "common iguana, iguana, Iguana iguana",
56
+ "40": "American chameleon, anole, Anolis carolinensis",
57
+ "41": "whiptail, whiptail lizard",
58
+ "42": "agama",
59
+ "43": "frilled lizard, Chlamydosaurus kingi",
60
+ "44": "alligator lizard",
61
+ "45": "Gila monster, Heloderma suspectum",
62
+ "46": "green lizard, Lacerta viridis",
63
+ "47": "African chameleon, Chamaeleo chamaeleon",
64
+ "48": "Komodo dragon, Komodo lizard, dragon lizard, giant lizard, Varanus komodoensis",
65
+ "49": "African crocodile, Nile crocodile, Crocodylus niloticus",
66
+ "50": "American alligator, Alligator mississipiensis",
67
+ "51": "triceratops",
68
+ "52": "thunder snake, worm snake, Carphophis amoenus",
69
+ "53": "ringneck snake, ring-necked snake, ring snake",
70
+ "54": "hognose snake, puff adder, sand viper",
71
+ "55": "green snake, grass snake",
72
+ "56": "king snake, kingsnake",
73
+ "57": "garter snake, grass snake",
74
+ "58": "water snake",
75
+ "59": "vine snake",
76
+ "60": "night snake, Hypsiglena torquata",
77
+ "61": "boa constrictor, Constrictor constrictor",
78
+ "62": "rock python, rock snake, Python sebae",
79
+ "63": "Indian cobra, Naja naja",
80
+ "64": "green mamba",
81
+ "65": "sea snake",
82
+ "66": "horned viper, cerastes, sand viper, horned asp, Cerastes cornutus",
83
+ "67": "diamondback, diamondback rattlesnake, Crotalus adamanteus",
84
+ "68": "sidewinder, horned rattlesnake, Crotalus cerastes",
85
+ "69": "trilobite",
86
+ "70": "harvestman, daddy longlegs, Phalangium opilio",
87
+ "71": "scorpion",
88
+ "72": "black and gold garden spider, Argiope aurantia",
89
+ "73": "barn spider, Araneus cavaticus",
90
+ "74": "garden spider, Aranea diademata",
91
+ "75": "black widow, Latrodectus mactans",
92
+ "76": "tarantula",
93
+ "77": "wolf spider, hunting spider",
94
+ "78": "tick",
95
+ "79": "centipede",
96
+ "80": "black grouse",
97
+ "81": "ptarmigan",
98
+ "82": "ruffed grouse, partridge, Bonasa umbellus",
99
+ "83": "prairie chicken, prairie grouse, prairie fowl",
100
+ "84": "peacock",
101
+ "85": "quail",
102
+ "86": "partridge",
103
+ "87": "African grey, African gray, Psittacus erithacus",
104
+ "88": "macaw",
105
+ "89": "sulphur-crested cockatoo, Kakatoe galerita, Cacatua galerita",
106
+ "90": "lorikeet",
107
+ "91": "coucal",
108
+ "92": "bee eater",
109
+ "93": "hornbill",
110
+ "94": "hummingbird",
111
+ "95": "jacamar",
112
+ "96": "toucan",
113
+ "97": "drake",
114
+ "98": "red-breasted merganser, Mergus serrator",
115
+ "99": "goose",
116
+ "100": "black swan, Cygnus atratus",
117
+ "101": "tusker",
118
+ "102": "echidna, spiny anteater, anteater",
119
+ "103": "platypus, duckbill, duckbilled platypus, duck-billed platypus, Ornithorhynchus anatinus",
120
+ "104": "wallaby, brush kangaroo",
121
+ "105": "koala, koala bear, kangaroo bear, native bear, Phascolarctos cinereus",
122
+ "106": "wombat",
123
+ "107": "jellyfish",
124
+ "108": "sea anemone, anemone",
125
+ "109": "brain coral",
126
+ "110": "flatworm, platyhelminth",
127
+ "111": "nematode, nematode worm, roundworm",
128
+ "112": "conch",
129
+ "113": "snail",
130
+ "114": "slug",
131
+ "115": "sea slug, nudibranch",
132
+ "116": "chiton, coat-of-mail shell, sea cradle, polyplacophore",
133
+ "117": "chambered nautilus, pearly nautilus, nautilus",
134
+ "118": "Dungeness crab, Cancer magister",
135
+ "119": "rock crab, Cancer irroratus",
136
+ "120": "fiddler crab",
137
+ "121": "king crab, Alaska crab, Alaskan king crab, Alaska king crab, Paralithodes camtschatica",
138
+ "122": "American lobster, Northern lobster, Maine lobster, Homarus americanus",
139
+ "123": "spiny lobster, langouste, rock lobster, crawfish, crayfish, sea crawfish",
140
+ "124": "crayfish, crawfish, crawdad, crawdaddy",
141
+ "125": "hermit crab",
142
+ "126": "isopod",
143
+ "127": "white stork, Ciconia ciconia",
144
+ "128": "black stork, Ciconia nigra",
145
+ "129": "spoonbill",
146
+ "130": "flamingo",
147
+ "131": "little blue heron, Egretta caerulea",
148
+ "132": "American egret, great white heron, Egretta albus",
149
+ "133": "bittern",
150
+ "134": "crane",
151
+ "135": "limpkin, Aramus pictus",
152
+ "136": "European gallinule, Porphyrio porphyrio",
153
+ "137": "American coot, marsh hen, mud hen, water hen, Fulica americana",
154
+ "138": "bustard",
155
+ "139": "ruddy turnstone, Arenaria interpres",
156
+ "140": "red-backed sandpiper, dunlin, Erolia alpina",
157
+ "141": "redshank, Tringa totanus",
158
+ "142": "dowitcher",
159
+ "143": "oystercatcher, oyster catcher",
160
+ "144": "pelican",
161
+ "145": "king penguin, Aptenodytes patagonica",
162
+ "146": "albatross, mollymawk",
163
+ "147": "grey whale, gray whale, devilfish, Eschrichtius gibbosus, Eschrichtius robustus",
164
+ "148": "killer whale, killer, orca, grampus, sea wolf, Orcinus orca",
165
+ "149": "dugong, Dugong dugon",
166
+ "150": "sea lion",
167
+ "151": "Chihuahua",
168
+ "152": "Japanese spaniel",
169
+ "153": "Maltese dog, Maltese terrier, Maltese",
170
+ "154": "Pekinese, Pekingese, Peke",
171
+ "155": "Shih-Tzu",
172
+ "156": "Blenheim spaniel",
173
+ "157": "papillon",
174
+ "158": "toy terrier",
175
+ "159": "Rhodesian ridgeback",
176
+ "160": "Afghan hound, Afghan",
177
+ "161": "basset, basset hound",
178
+ "162": "beagle",
179
+ "163": "bloodhound, sleuthhound",
180
+ "164": "bluetick",
181
+ "165": "black-and-tan coonhound",
182
+ "166": "Walker hound, Walker foxhound",
183
+ "167": "English foxhound",
184
+ "168": "redbone",
185
+ "169": "borzoi, Russian wolfhound",
186
+ "170": "Irish wolfhound",
187
+ "171": "Italian greyhound",
188
+ "172": "whippet",
189
+ "173": "Ibizan hound, Ibizan Podenco",
190
+ "174": "Norwegian elkhound, elkhound",
191
+ "175": "otterhound, otter hound",
192
+ "176": "Saluki, gazelle hound",
193
+ "177": "Scottish deerhound, deerhound",
194
+ "178": "Weimaraner",
195
+ "179": "Staffordshire bullterrier, Staffordshire bull terrier",
196
+ "180": "American Staffordshire terrier, Staffordshire terrier, American pit bull terrier, pit bull terrier",
197
+ "181": "Bedlington terrier",
198
+ "182": "Border terrier",
199
+ "183": "Kerry blue terrier",
200
+ "184": "Irish terrier",
201
+ "185": "Norfolk terrier",
202
+ "186": "Norwich terrier",
203
+ "187": "Yorkshire terrier",
204
+ "188": "wire-haired fox terrier",
205
+ "189": "Lakeland terrier",
206
+ "190": "Sealyham terrier, Sealyham",
207
+ "191": "Airedale, Airedale terrier",
208
+ "192": "cairn, cairn terrier",
209
+ "193": "Australian terrier",
210
+ "194": "Dandie Dinmont, Dandie Dinmont terrier",
211
+ "195": "Boston bull, Boston terrier",
212
+ "196": "miniature schnauzer",
213
+ "197": "giant schnauzer",
214
+ "198": "standard schnauzer",
215
+ "199": "Scotch terrier, Scottish terrier, Scottie",
216
+ "200": "Tibetan terrier, chrysanthemum dog",
217
+ "201": "silky terrier, Sydney silky",
218
+ "202": "soft-coated wheaten terrier",
219
+ "203": "West Highland white terrier",
220
+ "204": "Lhasa, Lhasa apso",
221
+ "205": "flat-coated retriever",
222
+ "206": "curly-coated retriever",
223
+ "207": "golden retriever",
224
+ "208": "Labrador retriever",
225
+ "209": "Chesapeake Bay retriever",
226
+ "210": "German short-haired pointer",
227
+ "211": "vizsla, Hungarian pointer",
228
+ "212": "English setter",
229
+ "213": "Irish setter, red setter",
230
+ "214": "Gordon setter",
231
+ "215": "Brittany spaniel",
232
+ "216": "clumber, clumber spaniel",
233
+ "217": "English springer, English springer spaniel",
234
+ "218": "Welsh springer spaniel",
235
+ "219": "cocker spaniel, English cocker spaniel, cocker",
236
+ "220": "Sussex spaniel",
237
+ "221": "Irish water spaniel",
238
+ "222": "kuvasz",
239
+ "223": "schipperke",
240
+ "224": "groenendael",
241
+ "225": "malinois",
242
+ "226": "briard",
243
+ "227": "kelpie",
244
+ "228": "komondor",
245
+ "229": "Old English sheepdog, bobtail",
246
+ "230": "Shetland sheepdog, Shetland sheep dog, Shetland",
247
+ "231": "collie",
248
+ "232": "Border collie",
249
+ "233": "Bouvier des Flandres, Bouviers des Flandres",
250
+ "234": "Rottweiler",
251
+ "235": "German shepherd, German shepherd dog, German police dog, alsatian",
252
+ "236": "Doberman, Doberman pinscher",
253
+ "237": "miniature pinscher",
254
+ "238": "Greater Swiss Mountain dog",
255
+ "239": "Bernese mountain dog",
256
+ "240": "Appenzeller",
257
+ "241": "EntleBucher",
258
+ "242": "boxer",
259
+ "243": "bull mastiff",
260
+ "244": "Tibetan mastiff",
261
+ "245": "French bulldog",
262
+ "246": "Great Dane",
263
+ "247": "Saint Bernard, St Bernard",
264
+ "248": "Eskimo dog, husky",
265
+ "249": "malamute, malemute, Alaskan malamute",
266
+ "250": "Siberian husky",
267
+ "251": "dalmatian, coach dog, carriage dog",
268
+ "252": "affenpinscher, monkey pinscher, monkey dog",
269
+ "253": "basenji",
270
+ "254": "pug, pug-dog",
271
+ "255": "Leonberg",
272
+ "256": "Newfoundland, Newfoundland dog",
273
+ "257": "Great Pyrenees",
274
+ "258": "Samoyed, Samoyede",
275
+ "259": "Pomeranian",
276
+ "260": "chow, chow chow",
277
+ "261": "keeshond",
278
+ "262": "Brabancon griffon",
279
+ "263": "Pembroke, Pembroke Welsh corgi",
280
+ "264": "Cardigan, Cardigan Welsh corgi",
281
+ "265": "toy poodle",
282
+ "266": "miniature poodle",
283
+ "267": "standard poodle",
284
+ "268": "Mexican hairless",
285
+ "269": "timber wolf, grey wolf, gray wolf, Canis lupus",
286
+ "270": "white wolf, Arctic wolf, Canis lupus tundrarum",
287
+ "271": "red wolf, maned wolf, Canis rufus, Canis niger",
288
+ "272": "coyote, prairie wolf, brush wolf, Canis latrans",
289
+ "273": "dingo, warrigal, warragal, Canis dingo",
290
+ "274": "dhole, Cuon alpinus",
291
+ "275": "African hunting dog, hyena dog, Cape hunting dog, Lycaon pictus",
292
+ "276": "hyena, hyaena",
293
+ "277": "red fox, Vulpes vulpes",
294
+ "278": "kit fox, Vulpes macrotis",
295
+ "279": "Arctic fox, white fox, Alopex lagopus",
296
+ "280": "grey fox, gray fox, Urocyon cinereoargenteus",
297
+ "281": "tabby, tabby cat",
298
+ "282": "tiger cat",
299
+ "283": "Persian cat",
300
+ "284": "Siamese cat, Siamese",
301
+ "285": "Egyptian cat",
302
+ "286": "cougar, puma, catamount, mountain lion, painter, panther, Felis concolor",
303
+ "287": "lynx, catamount",
304
+ "288": "leopard, Panthera pardus",
305
+ "289": "snow leopard, ounce, Panthera uncia",
306
+ "290": "jaguar, panther, Panthera onca, Felis onca",
307
+ "291": "lion, king of beasts, Panthera leo",
308
+ "292": "tiger, Panthera tigris",
309
+ "293": "cheetah, chetah, Acinonyx jubatus",
310
+ "294": "brown bear, bruin, Ursus arctos",
311
+ "295": "American black bear, black bear, Ursus americanus, Euarctos americanus",
312
+ "296": "ice bear, polar bear, Ursus Maritimus, Thalarctos maritimus",
313
+ "297": "sloth bear, Melursus ursinus, Ursus ursinus",
314
+ "298": "mongoose",
315
+ "299": "meerkat, mierkat",
316
+ "300": "tiger beetle",
317
+ "301": "ladybug, ladybeetle, lady beetle, ladybird, ladybird beetle",
318
+ "302": "ground beetle, carabid beetle",
319
+ "303": "long-horned beetle, longicorn, longicorn beetle",
320
+ "304": "leaf beetle, chrysomelid",
321
+ "305": "dung beetle",
322
+ "306": "rhinoceros beetle",
323
+ "307": "weevil",
324
+ "308": "fly",
325
+ "309": "bee",
326
+ "310": "ant, emmet, pismire",
327
+ "311": "grasshopper, hopper",
328
+ "312": "cricket",
329
+ "313": "walking stick, walkingstick, stick insect",
330
+ "314": "cockroach, roach",
331
+ "315": "mantis, mantid",
332
+ "316": "cicada, cicala",
333
+ "317": "leafhopper",
334
+ "318": "lacewing, lacewing fly",
335
+ "319": "dragonfly, darning needle, devil's darning needle, sewing needle, snake feeder, snake doctor, mosquito hawk, skeeter hawk",
336
+ "320": "damselfly",
337
+ "321": "admiral",
338
+ "322": "ringlet, ringlet butterfly",
339
+ "323": "monarch, monarch butterfly, milkweed butterfly, Danaus plexippus",
340
+ "324": "cabbage butterfly",
341
+ "325": "sulphur butterfly, sulfur butterfly",
342
+ "326": "lycaenid, lycaenid butterfly",
343
+ "327": "starfish, sea star",
344
+ "328": "sea urchin",
345
+ "329": "sea cucumber, holothurian",
346
+ "330": "wood rabbit, cottontail, cottontail rabbit",
347
+ "331": "hare",
348
+ "332": "Angora, Angora rabbit",
349
+ "333": "hamster",
350
+ "334": "porcupine, hedgehog",
351
+ "335": "fox squirrel, eastern fox squirrel, Sciurus niger",
352
+ "336": "marmot",
353
+ "337": "beaver",
354
+ "338": "guinea pig, Cavia cobaya",
355
+ "339": "sorrel",
356
+ "340": "zebra",
357
+ "341": "hog, pig, grunter, squealer, Sus scrofa",
358
+ "342": "wild boar, boar, Sus scrofa",
359
+ "343": "warthog",
360
+ "344": "hippopotamus, hippo, river horse, Hippopotamus amphibius",
361
+ "345": "ox",
362
+ "346": "water buffalo, water ox, Asiatic buffalo, Bubalus bubalis",
363
+ "347": "bison",
364
+ "348": "ram, tup",
365
+ "349": "bighorn, bighorn sheep, cimarron, Rocky Mountain bighorn, Rocky Mountain sheep, Ovis canadensis",
366
+ "350": "ibex, Capra ibex",
367
+ "351": "hartebeest",
368
+ "352": "impala, Aepyceros melampus",
369
+ "353": "gazelle",
370
+ "354": "Arabian camel, dromedary, Camelus dromedarius",
371
+ "355": "llama",
372
+ "356": "weasel",
373
+ "357": "mink",
374
+ "358": "polecat, fitch, foulmart, foumart, Mustela putorius",
375
+ "359": "black-footed ferret, ferret, Mustela nigripes",
376
+ "360": "otter",
377
+ "361": "skunk, polecat, wood pussy",
378
+ "362": "badger",
379
+ "363": "armadillo",
380
+ "364": "three-toed sloth, ai, Bradypus tridactylus",
381
+ "365": "orangutan, orang, orangutang, Pongo pygmaeus",
382
+ "366": "gorilla, Gorilla gorilla",
383
+ "367": "chimpanzee, chimp, Pan troglodytes",
384
+ "368": "gibbon, Hylobates lar",
385
+ "369": "siamang, Hylobates syndactylus, Symphalangus syndactylus",
386
+ "370": "guenon, guenon monkey",
387
+ "371": "patas, hussar monkey, Erythrocebus patas",
388
+ "372": "baboon",
389
+ "373": "macaque",
390
+ "374": "langur",
391
+ "375": "colobus, colobus monkey",
392
+ "376": "proboscis monkey, Nasalis larvatus",
393
+ "377": "marmoset",
394
+ "378": "capuchin, ringtail, Cebus capucinus",
395
+ "379": "howler monkey, howler",
396
+ "380": "titi, titi monkey",
397
+ "381": "spider monkey, Ateles geoffroyi",
398
+ "382": "squirrel monkey, Saimiri sciureus",
399
+ "383": "Madagascar cat, ring-tailed lemur, Lemur catta",
400
+ "384": "indri, indris, Indri indri, Indri brevicaudatus",
401
+ "385": "Indian elephant, Elephas maximus",
402
+ "386": "African elephant, Loxodonta africana",
403
+ "387": "lesser panda, red panda, panda, bear cat, cat bear, Ailurus fulgens",
404
+ "388": "giant panda, panda, panda bear, coon bear, Ailuropoda melanoleuca",
405
+ "389": "barracouta, snoek",
406
+ "390": "eel",
407
+ "391": "coho, cohoe, coho salmon, blue jack, silver salmon, Oncorhynchus kisutch",
408
+ "392": "rock beauty, Holocanthus tricolor",
409
+ "393": "anemone fish",
410
+ "394": "sturgeon",
411
+ "395": "gar, garfish, garpike, billfish, Lepisosteus osseus",
412
+ "396": "lionfish",
413
+ "397": "puffer, pufferfish, blowfish, globefish",
414
+ "398": "abacus",
415
+ "399": "abaya",
416
+ "400": "academic gown, academic robe, judge's robe",
417
+ "401": "accordion, piano accordion, squeeze box",
418
+ "402": "acoustic guitar",
419
+ "403": "aircraft carrier, carrier, flattop, attack aircraft carrier",
420
+ "404": "airliner",
421
+ "405": "airship, dirigible",
422
+ "406": "altar",
423
+ "407": "ambulance",
424
+ "408": "amphibian, amphibious vehicle",
425
+ "409": "analog clock",
426
+ "410": "apiary, bee house",
427
+ "411": "apron",
428
+ "412": "ashcan, trash can, garbage can, wastebin, ash bin, ash-bin, ashbin, dustbin, trash barrel, trash bin",
429
+ "413": "assault rifle, assault gun",
430
+ "414": "backpack, back pack, knapsack, packsack, rucksack, haversack",
431
+ "415": "bakery, bakeshop, bakehouse",
432
+ "416": "balance beam, beam",
433
+ "417": "balloon",
434
+ "418": "ballpoint, ballpoint pen, ballpen, Biro",
435
+ "419": "Band Aid",
436
+ "420": "banjo",
437
+ "421": "bannister, banister, balustrade, balusters, handrail",
438
+ "422": "barbell",
439
+ "423": "barber chair",
440
+ "424": "barbershop",
441
+ "425": "barn",
442
+ "426": "barometer",
443
+ "427": "barrel, cask",
444
+ "428": "barrow, garden cart, lawn cart, wheelbarrow",
445
+ "429": "baseball",
446
+ "430": "basketball",
447
+ "431": "bassinet",
448
+ "432": "bassoon",
449
+ "433": "bathing cap, swimming cap",
450
+ "434": "bath towel",
451
+ "435": "bathtub, bathing tub, bath, tub",
452
+ "436": "beach wagon, station wagon, wagon, estate car, beach waggon, station waggon, waggon",
453
+ "437": "beacon, lighthouse, beacon light, pharos",
454
+ "438": "beaker",
455
+ "439": "bearskin, busby, shako",
456
+ "440": "beer bottle",
457
+ "441": "beer glass",
458
+ "442": "bell cote, bell cot",
459
+ "443": "bib",
460
+ "444": "bicycle-built-for-two, tandem bicycle, tandem",
461
+ "445": "bikini, two-piece",
462
+ "446": "binder, ring-binder",
463
+ "447": "binoculars, field glasses, opera glasses",
464
+ "448": "birdhouse",
465
+ "449": "boathouse",
466
+ "450": "bobsled, bobsleigh, bob",
467
+ "451": "bolo tie, bolo, bola tie, bola",
468
+ "452": "bonnet, poke bonnet",
469
+ "453": "bookcase",
470
+ "454": "bookshop, bookstore, bookstall",
471
+ "455": "bottlecap",
472
+ "456": "bow",
473
+ "457": "bow tie, bow-tie, bowtie",
474
+ "458": "brass, memorial tablet, plaque",
475
+ "459": "brassiere, bra, bandeau",
476
+ "460": "breakwater, groin, groyne, mole, bulwark, seawall, jetty",
477
+ "461": "breastplate, aegis, egis",
478
+ "462": "broom",
479
+ "463": "bucket, pail",
480
+ "464": "buckle",
481
+ "465": "bulletproof vest",
482
+ "466": "bullet train, bullet",
483
+ "467": "butcher shop, meat market",
484
+ "468": "cab, hack, taxi, taxicab",
485
+ "469": "caldron, cauldron",
486
+ "470": "candle, taper, wax light",
487
+ "471": "cannon",
488
+ "472": "canoe",
489
+ "473": "can opener, tin opener",
490
+ "474": "cardigan",
491
+ "475": "car mirror",
492
+ "476": "carousel, carrousel, merry-go-round, roundabout, whirligig",
493
+ "477": "carpenter's kit, tool kit",
494
+ "478": "carton",
495
+ "479": "car wheel",
496
+ "480": "cash machine, cash dispenser, automated teller machine, automatic teller machine, automated teller, automatic teller, ATM",
497
+ "481": "cassette",
498
+ "482": "cassette player",
499
+ "483": "castle",
500
+ "484": "catamaran",
501
+ "485": "CD player",
502
+ "486": "cello, violoncello",
503
+ "487": "cellular telephone, cellular phone, cellphone, cell, mobile phone",
504
+ "488": "chain",
505
+ "489": "chainlink fence",
506
+ "490": "chain mail, ring mail, mail, chain armor, chain armour, ring armor, ring armour",
507
+ "491": "chain saw, chainsaw",
508
+ "492": "chest",
509
+ "493": "chiffonier, commode",
510
+ "494": "chime, bell, gong",
511
+ "495": "china cabinet, china closet",
512
+ "496": "Christmas stocking",
513
+ "497": "church, church building",
514
+ "498": "cinema, movie theater, movie theatre, movie house, picture palace",
515
+ "499": "cleaver, meat cleaver, chopper",
516
+ "500": "cliff dwelling",
517
+ "501": "cloak",
518
+ "502": "clog, geta, patten, sabot",
519
+ "503": "cocktail shaker",
520
+ "504": "coffee mug",
521
+ "505": "coffeepot",
522
+ "506": "coil, spiral, volute, whorl, helix",
523
+ "507": "combination lock",
524
+ "508": "computer keyboard, keypad",
525
+ "509": "confectionery, confectionary, candy store",
526
+ "510": "container ship, containership, container vessel",
527
+ "511": "convertible",
528
+ "512": "corkscrew, bottle screw",
529
+ "513": "cornet, horn, trumpet, trump",
530
+ "514": "cowboy boot",
531
+ "515": "cowboy hat, ten-gallon hat",
532
+ "516": "cradle",
533
+ "517": "crane",
534
+ "518": "crash helmet",
535
+ "519": "crate",
536
+ "520": "crib, cot",
537
+ "521": "Crock Pot",
538
+ "522": "croquet ball",
539
+ "523": "crutch",
540
+ "524": "cuirass",
541
+ "525": "dam, dike, dyke",
542
+ "526": "desk",
543
+ "527": "desktop computer",
544
+ "528": "dial telephone, dial phone",
545
+ "529": "diaper, nappy, napkin",
546
+ "530": "digital clock",
547
+ "531": "digital watch",
548
+ "532": "dining table, board",
549
+ "533": "dishrag, dishcloth",
550
+ "534": "dishwasher, dish washer, dishwashing machine",
551
+ "535": "disk brake, disc brake",
552
+ "536": "dock, dockage, docking facility",
553
+ "537": "dogsled, dog sled, dog sleigh",
554
+ "538": "dome",
555
+ "539": "doormat, welcome mat",
556
+ "540": "drilling platform, offshore rig",
557
+ "541": "drum, membranophone, tympan",
558
+ "542": "drumstick",
559
+ "543": "dumbbell",
560
+ "544": "Dutch oven",
561
+ "545": "electric fan, blower",
562
+ "546": "electric guitar",
563
+ "547": "electric locomotive",
564
+ "548": "entertainment center",
565
+ "549": "envelope",
566
+ "550": "espresso maker",
567
+ "551": "face powder",
568
+ "552": "feather boa, boa",
569
+ "553": "file, file cabinet, filing cabinet",
570
+ "554": "fireboat",
571
+ "555": "fire engine, fire truck",
572
+ "556": "fire screen, fireguard",
573
+ "557": "flagpole, flagstaff",
574
+ "558": "flute, transverse flute",
575
+ "559": "folding chair",
576
+ "560": "football helmet",
577
+ "561": "forklift",
578
+ "562": "fountain",
579
+ "563": "fountain pen",
580
+ "564": "four-poster",
581
+ "565": "freight car",
582
+ "566": "French horn, horn",
583
+ "567": "frying pan, frypan, skillet",
584
+ "568": "fur coat",
585
+ "569": "garbage truck, dustcart",
586
+ "570": "gasmask, respirator, gas helmet",
587
+ "571": "gas pump, gasoline pump, petrol pump, island dispenser",
588
+ "572": "goblet",
589
+ "573": "go-kart",
590
+ "574": "golf ball",
591
+ "575": "golfcart, golf cart",
592
+ "576": "gondola",
593
+ "577": "gong, tam-tam",
594
+ "578": "gown",
595
+ "579": "grand piano, grand",
596
+ "580": "greenhouse, nursery, glasshouse",
597
+ "581": "grille, radiator grille",
598
+ "582": "grocery store, grocery, food market, market",
599
+ "583": "guillotine",
600
+ "584": "hair slide",
601
+ "585": "hair spray",
602
+ "586": "half track",
603
+ "587": "hammer",
604
+ "588": "hamper",
605
+ "589": "hand blower, blow dryer, blow drier, hair dryer, hair drier",
606
+ "590": "hand-held computer, hand-held microcomputer",
607
+ "591": "handkerchief, hankie, hanky, hankey",
608
+ "592": "hard disc, hard disk, fixed disk",
609
+ "593": "harmonica, mouth organ, harp, mouth harp",
610
+ "594": "harp",
611
+ "595": "harvester, reaper",
612
+ "596": "hatchet",
613
+ "597": "holster",
614
+ "598": "home theater, home theatre",
615
+ "599": "honeycomb",
616
+ "600": "hook, claw",
617
+ "601": "hoopskirt, crinoline",
618
+ "602": "horizontal bar, high bar",
619
+ "603": "horse cart, horse-cart",
620
+ "604": "hourglass",
621
+ "605": "iPod",
622
+ "606": "iron, smoothing iron",
623
+ "607": "jack-o'-lantern",
624
+ "608": "jean, blue jean, denim",
625
+ "609": "jeep, landrover",
626
+ "610": "jersey, T-shirt, tee shirt",
627
+ "611": "jigsaw puzzle",
628
+ "612": "jinrikisha, ricksha, rickshaw",
629
+ "613": "joystick",
630
+ "614": "kimono",
631
+ "615": "knee pad",
632
+ "616": "knot",
633
+ "617": "lab coat, laboratory coat",
634
+ "618": "ladle",
635
+ "619": "lampshade, lamp shade",
636
+ "620": "laptop, laptop computer",
637
+ "621": "lawn mower, mower",
638
+ "622": "lens cap, lens cover",
639
+ "623": "letter opener, paper knife, paperknife",
640
+ "624": "library",
641
+ "625": "lifeboat",
642
+ "626": "lighter, light, igniter, ignitor",
643
+ "627": "limousine, limo",
644
+ "628": "liner, ocean liner",
645
+ "629": "lipstick, lip rouge",
646
+ "630": "Loafer",
647
+ "631": "lotion",
648
+ "632": "loudspeaker, speaker, speaker unit, loudspeaker system, speaker system",
649
+ "633": "loupe, jeweler's loupe",
650
+ "634": "lumbermill, sawmill",
651
+ "635": "magnetic compass",
652
+ "636": "mailbag, postbag",
653
+ "637": "mailbox, letter box",
654
+ "638": "maillot",
655
+ "639": "maillot, tank suit",
656
+ "640": "manhole cover",
657
+ "641": "maraca",
658
+ "642": "marimba, xylophone",
659
+ "643": "mask",
660
+ "644": "matchstick",
661
+ "645": "maypole",
662
+ "646": "maze, labyrinth",
663
+ "647": "measuring cup",
664
+ "648": "medicine chest, medicine cabinet",
665
+ "649": "megalith, megalithic structure",
666
+ "650": "microphone, mike",
667
+ "651": "microwave, microwave oven",
668
+ "652": "military uniform",
669
+ "653": "milk can",
670
+ "654": "minibus",
671
+ "655": "miniskirt, mini",
672
+ "656": "minivan",
673
+ "657": "missile",
674
+ "658": "mitten",
675
+ "659": "mixing bowl",
676
+ "660": "mobile home, manufactured home",
677
+ "661": "Model T",
678
+ "662": "modem",
679
+ "663": "monastery",
680
+ "664": "monitor",
681
+ "665": "moped",
682
+ "666": "mortar",
683
+ "667": "mortarboard",
684
+ "668": "mosque",
685
+ "669": "mosquito net",
686
+ "670": "motor scooter, scooter",
687
+ "671": "mountain bike, all-terrain bike, off-roader",
688
+ "672": "mountain tent",
689
+ "673": "mouse, computer mouse",
690
+ "674": "mousetrap",
691
+ "675": "moving van",
692
+ "676": "muzzle",
693
+ "677": "nail",
694
+ "678": "neck brace",
695
+ "679": "necklace",
696
+ "680": "nipple",
697
+ "681": "notebook, notebook computer",
698
+ "682": "obelisk",
699
+ "683": "oboe, hautboy, hautbois",
700
+ "684": "ocarina, sweet potato",
701
+ "685": "odometer, hodometer, mileometer, milometer",
702
+ "686": "oil filter",
703
+ "687": "organ, pipe organ",
704
+ "688": "oscilloscope, scope, cathode-ray oscilloscope, CRO",
705
+ "689": "overskirt",
706
+ "690": "oxcart",
707
+ "691": "oxygen mask",
708
+ "692": "packet",
709
+ "693": "paddle, boat paddle",
710
+ "694": "paddlewheel, paddle wheel",
711
+ "695": "padlock",
712
+ "696": "paintbrush",
713
+ "697": "pajama, pyjama, pj's, jammies",
714
+ "698": "palace",
715
+ "699": "panpipe, pandean pipe, syrinx",
716
+ "700": "paper towel",
717
+ "701": "parachute, chute",
718
+ "702": "parallel bars, bars",
719
+ "703": "park bench",
720
+ "704": "parking meter",
721
+ "705": "passenger car, coach, carriage",
722
+ "706": "patio, terrace",
723
+ "707": "pay-phone, pay-station",
724
+ "708": "pedestal, plinth, footstall",
725
+ "709": "pencil box, pencil case",
726
+ "710": "pencil sharpener",
727
+ "711": "perfume, essence",
728
+ "712": "Petri dish",
729
+ "713": "photocopier",
730
+ "714": "pick, plectrum, plectron",
731
+ "715": "pickelhaube",
732
+ "716": "picket fence, paling",
733
+ "717": "pickup, pickup truck",
734
+ "718": "pier",
735
+ "719": "piggy bank, penny bank",
736
+ "720": "pill bottle",
737
+ "721": "pillow",
738
+ "722": "ping-pong ball",
739
+ "723": "pinwheel",
740
+ "724": "pirate, pirate ship",
741
+ "725": "pitcher, ewer",
742
+ "726": "plane, carpenter's plane, woodworking plane",
743
+ "727": "planetarium",
744
+ "728": "plastic bag",
745
+ "729": "plate rack",
746
+ "730": "plow, plough",
747
+ "731": "plunger, plumber's helper",
748
+ "732": "Polaroid camera, Polaroid Land camera",
749
+ "733": "pole",
750
+ "734": "police van, police wagon, paddy wagon, patrol wagon, wagon, black Maria",
751
+ "735": "poncho",
752
+ "736": "pool table, billiard table, snooker table",
753
+ "737": "pop bottle, soda bottle",
754
+ "738": "pot, flowerpot",
755
+ "739": "potter's wheel",
756
+ "740": "power drill",
757
+ "741": "prayer rug, prayer mat",
758
+ "742": "printer",
759
+ "743": "prison, prison house",
760
+ "744": "projectile, missile",
761
+ "745": "projector",
762
+ "746": "puck, hockey puck",
763
+ "747": "punching bag, punch bag, punching ball, punchball",
764
+ "748": "purse",
765
+ "749": "quill, quill pen",
766
+ "750": "quilt, comforter, comfort, puff",
767
+ "751": "racer, race car, racing car",
768
+ "752": "racket, racquet",
769
+ "753": "radiator",
770
+ "754": "radio, wireless",
771
+ "755": "radio telescope, radio reflector",
772
+ "756": "rain barrel",
773
+ "757": "recreational vehicle, RV, R.V.",
774
+ "758": "reel",
775
+ "759": "reflex camera",
776
+ "760": "refrigerator, icebox",
777
+ "761": "remote control, remote",
778
+ "762": "restaurant, eating house, eating place, eatery",
779
+ "763": "revolver, six-gun, six-shooter",
780
+ "764": "rifle",
781
+ "765": "rocking chair, rocker",
782
+ "766": "rotisserie",
783
+ "767": "rubber eraser, rubber, pencil eraser",
784
+ "768": "rugby ball",
785
+ "769": "rule, ruler",
786
+ "770": "running shoe",
787
+ "771": "safe",
788
+ "772": "safety pin",
789
+ "773": "saltshaker, salt shaker",
790
+ "774": "sandal",
791
+ "775": "sarong",
792
+ "776": "sax, saxophone",
793
+ "777": "scabbard",
794
+ "778": "scale, weighing machine",
795
+ "779": "school bus",
796
+ "780": "schooner",
797
+ "781": "scoreboard",
798
+ "782": "screen, CRT screen",
799
+ "783": "screw",
800
+ "784": "screwdriver",
801
+ "785": "seat belt, seatbelt",
802
+ "786": "sewing machine",
803
+ "787": "shield, buckler",
804
+ "788": "shoe shop, shoe-shop, shoe store",
805
+ "789": "shoji",
806
+ "790": "shopping basket",
807
+ "791": "shopping cart",
808
+ "792": "shovel",
809
+ "793": "shower cap",
810
+ "794": "shower curtain",
811
+ "795": "ski",
812
+ "796": "ski mask",
813
+ "797": "sleeping bag",
814
+ "798": "slide rule, slipstick",
815
+ "799": "sliding door",
816
+ "800": "slot, one-armed bandit",
817
+ "801": "snorkel",
818
+ "802": "snowmobile",
819
+ "803": "snowplow, snowplough",
820
+ "804": "soap dispenser",
821
+ "805": "soccer ball",
822
+ "806": "sock",
823
+ "807": "solar dish, solar collector, solar furnace",
824
+ "808": "sombrero",
825
+ "809": "soup bowl",
826
+ "810": "space bar",
827
+ "811": "space heater",
828
+ "812": "space shuttle",
829
+ "813": "spatula",
830
+ "814": "speedboat",
831
+ "815": "spider web, spider's web",
832
+ "816": "spindle",
833
+ "817": "sports car, sport car",
834
+ "818": "spotlight, spot",
835
+ "819": "stage",
836
+ "820": "steam locomotive",
837
+ "821": "steel arch bridge",
838
+ "822": "steel drum",
839
+ "823": "stethoscope",
840
+ "824": "stole",
841
+ "825": "stone wall",
842
+ "826": "stopwatch, stop watch",
843
+ "827": "stove",
844
+ "828": "strainer",
845
+ "829": "streetcar, tram, tramcar, trolley, trolley car",
846
+ "830": "stretcher",
847
+ "831": "studio couch, day bed",
848
+ "832": "stupa, tope",
849
+ "833": "submarine, pigboat, sub, U-boat",
850
+ "834": "suit, suit of clothes",
851
+ "835": "sundial",
852
+ "836": "sunglass",
853
+ "837": "sunglasses, dark glasses, shades",
854
+ "838": "sunscreen, sunblock, sun blocker",
855
+ "839": "suspension bridge",
856
+ "840": "swab, swob, mop",
857
+ "841": "sweatshirt",
858
+ "842": "swimming trunks, bathing trunks",
859
+ "843": "swing",
860
+ "844": "switch, electric switch, electrical switch",
861
+ "845": "syringe",
862
+ "846": "table lamp",
863
+ "847": "tank, army tank, armored combat vehicle, armoured combat vehicle",
864
+ "848": "tape player",
865
+ "849": "teapot",
866
+ "850": "teddy, teddy bear",
867
+ "851": "television, television system",
868
+ "852": "tennis ball",
869
+ "853": "thatch, thatched roof",
870
+ "854": "theater curtain, theatre curtain",
871
+ "855": "thimble",
872
+ "856": "thresher, thrasher, threshing machine",
873
+ "857": "throne",
874
+ "858": "tile roof",
875
+ "859": "toaster",
876
+ "860": "tobacco shop, tobacconist shop, tobacconist",
877
+ "861": "toilet seat",
878
+ "862": "torch",
879
+ "863": "totem pole",
880
+ "864": "tow truck, tow car, wrecker",
881
+ "865": "toyshop",
882
+ "866": "tractor",
883
+ "867": "trailer truck, tractor trailer, trucking rig, rig, articulated lorry, semi",
884
+ "868": "tray",
885
+ "869": "trench coat",
886
+ "870": "tricycle, trike, velocipede",
887
+ "871": "trimaran",
888
+ "872": "tripod",
889
+ "873": "triumphal arch",
890
+ "874": "trolleybus, trolley coach, trackless trolley",
891
+ "875": "trombone",
892
+ "876": "tub, vat",
893
+ "877": "turnstile",
894
+ "878": "typewriter keyboard",
895
+ "879": "umbrella",
896
+ "880": "unicycle, monocycle",
897
+ "881": "upright, upright piano",
898
+ "882": "vacuum, vacuum cleaner",
899
+ "883": "vase",
900
+ "884": "vault",
901
+ "885": "velvet",
902
+ "886": "vending machine",
903
+ "887": "vestment",
904
+ "888": "viaduct",
905
+ "889": "violin, fiddle",
906
+ "890": "volleyball",
907
+ "891": "waffle iron",
908
+ "892": "wall clock",
909
+ "893": "wallet, billfold, notecase, pocketbook",
910
+ "894": "wardrobe, closet, press",
911
+ "895": "warplane, military plane",
912
+ "896": "washbasin, handbasin, washbowl, lavabo, wash-hand basin",
913
+ "897": "washer, automatic washer, washing machine",
914
+ "898": "water bottle",
915
+ "899": "water jug",
916
+ "900": "water tower",
917
+ "901": "whiskey jug",
918
+ "902": "whistle",
919
+ "903": "wig",
920
+ "904": "window screen",
921
+ "905": "window shade",
922
+ "906": "Windsor tie",
923
+ "907": "wine bottle",
924
+ "908": "wing",
925
+ "909": "wok",
926
+ "910": "wooden spoon",
927
+ "911": "wool, woolen, woollen",
928
+ "912": "worm fence, snake fence, snake-rail fence, Virginia fence",
929
+ "913": "wreck",
930
+ "914": "yawl",
931
+ "915": "yurt",
932
+ "916": "web site, website, internet site, site",
933
+ "917": "comic book",
934
+ "918": "crossword puzzle, crossword",
935
+ "919": "street sign",
936
+ "920": "traffic light, traffic signal, stoplight",
937
+ "921": "book jacket, dust cover, dust jacket, dust wrapper",
938
+ "922": "menu",
939
+ "923": "plate",
940
+ "924": "guacamole",
941
+ "925": "consomme",
942
+ "926": "hot pot, hotpot",
943
+ "927": "trifle",
944
+ "928": "ice cream, icecream",
945
+ "929": "ice lolly, lolly, lollipop, popsicle",
946
+ "930": "French loaf",
947
+ "931": "bagel, beigel",
948
+ "932": "pretzel",
949
+ "933": "cheeseburger",
950
+ "934": "hotdog, hot dog, red hot",
951
+ "935": "mashed potato",
952
+ "936": "head cabbage",
953
+ "937": "broccoli",
954
+ "938": "cauliflower",
955
+ "939": "zucchini, courgette",
956
+ "940": "spaghetti squash",
957
+ "941": "acorn squash",
958
+ "942": "butternut squash",
959
+ "943": "cucumber, cuke",
960
+ "944": "artichoke, globe artichoke",
961
+ "945": "bell pepper",
962
+ "946": "cardoon",
963
+ "947": "mushroom",
964
+ "948": "Granny Smith",
965
+ "949": "strawberry",
966
+ "950": "orange",
967
+ "951": "lemon",
968
+ "952": "fig",
969
+ "953": "pineapple, ananas",
970
+ "954": "banana",
971
+ "955": "jackfruit, jak, jack",
972
+ "956": "custard apple",
973
+ "957": "pomegranate",
974
+ "958": "hay",
975
+ "959": "carbonara",
976
+ "960": "chocolate sauce, chocolate syrup",
977
+ "961": "dough",
978
+ "962": "meat loaf, meatloaf",
979
+ "963": "pizza, pizza pie",
980
+ "964": "potpie",
981
+ "965": "burrito",
982
+ "966": "red wine",
983
+ "967": "espresso",
984
+ "968": "cup",
985
+ "969": "eggnog",
986
+ "970": "alp",
987
+ "971": "bubble",
988
+ "972": "cliff, drop, drop-off",
989
+ "973": "coral reef",
990
+ "974": "geyser",
991
+ "975": "lakeside, lakeshore",
992
+ "976": "promontory, headland, head, foreland",
993
+ "977": "sandbar, sand bar",
994
+ "978": "seashore, coast, seacoast, sea-coast",
995
+ "979": "valley, vale",
996
+ "980": "volcano",
997
+ "981": "ballplayer, baseball player",
998
+ "982": "groom, bridegroom",
999
+ "983": "scuba diver",
1000
+ "984": "rapeseed",
1001
+ "985": "daisy",
1002
+ "986": "yellow lady's slipper, yellow lady-slipper, Cypripedium calceolus, Cypripedium parviflorum",
1003
+ "987": "corn",
1004
+ "988": "acorn",
1005
+ "989": "hip, rose hip, rosehip",
1006
+ "990": "buckeye, horse chestnut, conker",
1007
+ "991": "coral fungus",
1008
+ "992": "agaric",
1009
+ "993": "gyromitra",
1010
+ "994": "stinkhorn, carrion fungus",
1011
+ "995": "earthstar",
1012
+ "996": "hen-of-the-woods, hen of the woods, Polyporus frondosus, Grifola frondosa",
1013
+ "997": "bolete",
1014
+ "998": "ear, spike, capitulum",
1015
+ "999": "toilet tissue, toilet paper, bathroom tissue"
1016
+ },
1017
+ "image_size": 224,
1018
+ "initializer_range": 0.02,
1019
+ "intermediate_size": 3072,
1020
+ "label2id": {
1021
+ "Afghan hound, Afghan": 160,
1022
+ "African chameleon, Chamaeleo chamaeleon": 47,
1023
+ "African crocodile, Nile crocodile, Crocodylus niloticus": 49,
1024
+ "African elephant, Loxodonta africana": 386,
1025
+ "African grey, African gray, Psittacus erithacus": 87,
1026
+ "African hunting dog, hyena dog, Cape hunting dog, Lycaon pictus": 275,
1027
+ "Airedale, Airedale terrier": 191,
1028
+ "American Staffordshire terrier, Staffordshire terrier, American pit bull terrier, pit bull terrier": 180,
1029
+ "American alligator, Alligator mississipiensis": 50,
1030
+ "American black bear, black bear, Ursus americanus, Euarctos americanus": 295,
1031
+ "American chameleon, anole, Anolis carolinensis": 40,
1032
+ "American coot, marsh hen, mud hen, water hen, Fulica americana": 137,
1033
+ "American egret, great white heron, Egretta albus": 132,
1034
+ "American lobster, Northern lobster, Maine lobster, Homarus americanus": 122,
1035
+ "Angora, Angora rabbit": 332,
1036
+ "Appenzeller": 240,
1037
+ "Arabian camel, dromedary, Camelus dromedarius": 354,
1038
+ "Arctic fox, white fox, Alopex lagopus": 279,
1039
+ "Australian terrier": 193,
1040
+ "Band Aid": 419,
1041
+ "Bedlington terrier": 181,
1042
+ "Bernese mountain dog": 239,
1043
+ "Blenheim spaniel": 156,
1044
+ "Border collie": 232,
1045
+ "Border terrier": 182,
1046
+ "Boston bull, Boston terrier": 195,
1047
+ "Bouvier des Flandres, Bouviers des Flandres": 233,
1048
+ "Brabancon griffon": 262,
1049
+ "Brittany spaniel": 215,
1050
+ "CD player": 485,
1051
+ "Cardigan, Cardigan Welsh corgi": 264,
1052
+ "Chesapeake Bay retriever": 209,
1053
+ "Chihuahua": 151,
1054
+ "Christmas stocking": 496,
1055
+ "Crock Pot": 521,
1056
+ "Dandie Dinmont, Dandie Dinmont terrier": 194,
1057
+ "Doberman, Doberman pinscher": 236,
1058
+ "Dungeness crab, Cancer magister": 118,
1059
+ "Dutch oven": 544,
1060
+ "Egyptian cat": 285,
1061
+ "English foxhound": 167,
1062
+ "English setter": 212,
1063
+ "English springer, English springer spaniel": 217,
1064
+ "EntleBucher": 241,
1065
+ "Eskimo dog, husky": 248,
1066
+ "European fire salamander, Salamandra salamandra": 25,
1067
+ "European gallinule, Porphyrio porphyrio": 136,
1068
+ "French bulldog": 245,
1069
+ "French horn, horn": 566,
1070
+ "French loaf": 930,
1071
+ "German shepherd, German shepherd dog, German police dog, alsatian": 235,
1072
+ "German short-haired pointer": 210,
1073
+ "Gila monster, Heloderma suspectum": 45,
1074
+ "Gordon setter": 214,
1075
+ "Granny Smith": 948,
1076
+ "Great Dane": 246,
1077
+ "Great Pyrenees": 257,
1078
+ "Greater Swiss Mountain dog": 238,
1079
+ "Ibizan hound, Ibizan Podenco": 173,
1080
+ "Indian cobra, Naja naja": 63,
1081
+ "Indian elephant, Elephas maximus": 385,
1082
+ "Irish setter, red setter": 213,
1083
+ "Irish terrier": 184,
1084
+ "Irish water spaniel": 221,
1085
+ "Irish wolfhound": 170,
1086
+ "Italian greyhound": 171,
1087
+ "Japanese spaniel": 152,
1088
+ "Kerry blue terrier": 183,
1089
+ "Komodo dragon, Komodo lizard, dragon lizard, giant lizard, Varanus komodoensis": 48,
1090
+ "Labrador retriever": 208,
1091
+ "Lakeland terrier": 189,
1092
+ "Leonberg": 255,
1093
+ "Lhasa, Lhasa apso": 204,
1094
+ "Loafer": 630,
1095
+ "Madagascar cat, ring-tailed lemur, Lemur catta": 383,
1096
+ "Maltese dog, Maltese terrier, Maltese": 153,
1097
+ "Mexican hairless": 268,
1098
+ "Model T": 661,
1099
+ "Newfoundland, Newfoundland dog": 256,
1100
+ "Norfolk terrier": 185,
1101
+ "Norwegian elkhound, elkhound": 174,
1102
+ "Norwich terrier": 186,
1103
+ "Old English sheepdog, bobtail": 229,
1104
+ "Pekinese, Pekingese, Peke": 154,
1105
+ "Pembroke, Pembroke Welsh corgi": 263,
1106
+ "Persian cat": 283,
1107
+ "Petri dish": 712,
1108
+ "Polaroid camera, Polaroid Land camera": 732,
1109
+ "Pomeranian": 259,
1110
+ "Rhodesian ridgeback": 159,
1111
+ "Rottweiler": 234,
1112
+ "Saint Bernard, St Bernard": 247,
1113
+ "Saluki, gazelle hound": 176,
1114
+ "Samoyed, Samoyede": 258,
1115
+ "Scotch terrier, Scottish terrier, Scottie": 199,
1116
+ "Scottish deerhound, deerhound": 177,
1117
+ "Sealyham terrier, Sealyham": 190,
1118
+ "Shetland sheepdog, Shetland sheep dog, Shetland": 230,
1119
+ "Shih-Tzu": 155,
1120
+ "Siamese cat, Siamese": 284,
1121
+ "Siberian husky": 250,
1122
+ "Staffordshire bullterrier, Staffordshire bull terrier": 179,
1123
+ "Sussex spaniel": 220,
1124
+ "Tibetan mastiff": 244,
1125
+ "Tibetan terrier, chrysanthemum dog": 200,
1126
+ "Walker hound, Walker foxhound": 166,
1127
+ "Weimaraner": 178,
1128
+ "Welsh springer spaniel": 218,
1129
+ "West Highland white terrier": 203,
1130
+ "Windsor tie": 906,
1131
+ "Yorkshire terrier": 187,
1132
+ "abacus": 398,
1133
+ "abaya": 399,
1134
+ "academic gown, academic robe, judge's robe": 400,
1135
+ "accordion, piano accordion, squeeze box": 401,
1136
+ "acorn": 988,
1137
+ "acorn squash": 941,
1138
+ "acoustic guitar": 402,
1139
+ "admiral": 321,
1140
+ "affenpinscher, monkey pinscher, monkey dog": 252,
1141
+ "agama": 42,
1142
+ "agaric": 992,
1143
+ "aircraft carrier, carrier, flattop, attack aircraft carrier": 403,
1144
+ "airliner": 404,
1145
+ "airship, dirigible": 405,
1146
+ "albatross, mollymawk": 146,
1147
+ "alligator lizard": 44,
1148
+ "alp": 970,
1149
+ "altar": 406,
1150
+ "ambulance": 407,
1151
+ "amphibian, amphibious vehicle": 408,
1152
+ "analog clock": 409,
1153
+ "anemone fish": 393,
1154
+ "ant, emmet, pismire": 310,
1155
+ "apiary, bee house": 410,
1156
+ "apron": 411,
1157
+ "armadillo": 363,
1158
+ "artichoke, globe artichoke": 944,
1159
+ "ashcan, trash can, garbage can, wastebin, ash bin, ash-bin, ashbin, dustbin, trash barrel, trash bin": 412,
1160
+ "assault rifle, assault gun": 413,
1161
+ "axolotl, mud puppy, Ambystoma mexicanum": 29,
1162
+ "baboon": 372,
1163
+ "backpack, back pack, knapsack, packsack, rucksack, haversack": 414,
1164
+ "badger": 362,
1165
+ "bagel, beigel": 931,
1166
+ "bakery, bakeshop, bakehouse": 415,
1167
+ "balance beam, beam": 416,
1168
+ "bald eagle, American eagle, Haliaeetus leucocephalus": 22,
1169
+ "balloon": 417,
1170
+ "ballplayer, baseball player": 981,
1171
+ "ballpoint, ballpoint pen, ballpen, Biro": 418,
1172
+ "banana": 954,
1173
+ "banded gecko": 38,
1174
+ "banjo": 420,
1175
+ "bannister, banister, balustrade, balusters, handrail": 421,
1176
+ "barbell": 422,
1177
+ "barber chair": 423,
1178
+ "barbershop": 424,
1179
+ "barn": 425,
1180
+ "barn spider, Araneus cavaticus": 73,
1181
+ "barometer": 426,
1182
+ "barracouta, snoek": 389,
1183
+ "barrel, cask": 427,
1184
+ "barrow, garden cart, lawn cart, wheelbarrow": 428,
1185
+ "baseball": 429,
1186
+ "basenji": 253,
1187
+ "basketball": 430,
1188
+ "basset, basset hound": 161,
1189
+ "bassinet": 431,
1190
+ "bassoon": 432,
1191
+ "bath towel": 434,
1192
+ "bathing cap, swimming cap": 433,
1193
+ "bathtub, bathing tub, bath, tub": 435,
1194
+ "beach wagon, station wagon, wagon, estate car, beach waggon, station waggon, waggon": 436,
1195
+ "beacon, lighthouse, beacon light, pharos": 437,
1196
+ "beagle": 162,
1197
+ "beaker": 438,
1198
+ "bearskin, busby, shako": 439,
1199
+ "beaver": 337,
1200
+ "bee": 309,
1201
+ "bee eater": 92,
1202
+ "beer bottle": 440,
1203
+ "beer glass": 441,
1204
+ "bell cote, bell cot": 442,
1205
+ "bell pepper": 945,
1206
+ "bib": 443,
1207
+ "bicycle-built-for-two, tandem bicycle, tandem": 444,
1208
+ "bighorn, bighorn sheep, cimarron, Rocky Mountain bighorn, Rocky Mountain sheep, Ovis canadensis": 349,
1209
+ "bikini, two-piece": 445,
1210
+ "binder, ring-binder": 446,
1211
+ "binoculars, field glasses, opera glasses": 447,
1212
+ "birdhouse": 448,
1213
+ "bison": 347,
1214
+ "bittern": 133,
1215
+ "black and gold garden spider, Argiope aurantia": 72,
1216
+ "black grouse": 80,
1217
+ "black stork, Ciconia nigra": 128,
1218
+ "black swan, Cygnus atratus": 100,
1219
+ "black widow, Latrodectus mactans": 75,
1220
+ "black-and-tan coonhound": 165,
1221
+ "black-footed ferret, ferret, Mustela nigripes": 359,
1222
+ "bloodhound, sleuthhound": 163,
1223
+ "bluetick": 164,
1224
+ "boa constrictor, Constrictor constrictor": 61,
1225
+ "boathouse": 449,
1226
+ "bobsled, bobsleigh, bob": 450,
1227
+ "bolete": 997,
1228
+ "bolo tie, bolo, bola tie, bola": 451,
1229
+ "bonnet, poke bonnet": 452,
1230
+ "book jacket, dust cover, dust jacket, dust wrapper": 921,
1231
+ "bookcase": 453,
1232
+ "bookshop, bookstore, bookstall": 454,
1233
+ "borzoi, Russian wolfhound": 169,
1234
+ "bottlecap": 455,
1235
+ "bow": 456,
1236
+ "bow tie, bow-tie, bowtie": 457,
1237
+ "box turtle, box tortoise": 37,
1238
+ "boxer": 242,
1239
+ "brain coral": 109,
1240
+ "brambling, Fringilla montifringilla": 10,
1241
+ "brass, memorial tablet, plaque": 458,
1242
+ "brassiere, bra, bandeau": 459,
1243
+ "breakwater, groin, groyne, mole, bulwark, seawall, jetty": 460,
1244
+ "breastplate, aegis, egis": 461,
1245
+ "briard": 226,
1246
+ "broccoli": 937,
1247
+ "broom": 462,
1248
+ "brown bear, bruin, Ursus arctos": 294,
1249
+ "bubble": 971,
1250
+ "bucket, pail": 463,
1251
+ "buckeye, horse chestnut, conker": 990,
1252
+ "buckle": 464,
1253
+ "bulbul": 16,
1254
+ "bull mastiff": 243,
1255
+ "bullet train, bullet": 466,
1256
+ "bulletproof vest": 465,
1257
+ "bullfrog, Rana catesbeiana": 30,
1258
+ "burrito": 965,
1259
+ "bustard": 138,
1260
+ "butcher shop, meat market": 467,
1261
+ "butternut squash": 942,
1262
+ "cab, hack, taxi, taxicab": 468,
1263
+ "cabbage butterfly": 324,
1264
+ "cairn, cairn terrier": 192,
1265
+ "caldron, cauldron": 469,
1266
+ "can opener, tin opener": 473,
1267
+ "candle, taper, wax light": 470,
1268
+ "cannon": 471,
1269
+ "canoe": 472,
1270
+ "capuchin, ringtail, Cebus capucinus": 378,
1271
+ "car mirror": 475,
1272
+ "car wheel": 479,
1273
+ "carbonara": 959,
1274
+ "cardigan": 474,
1275
+ "cardoon": 946,
1276
+ "carousel, carrousel, merry-go-round, roundabout, whirligig": 476,
1277
+ "carpenter's kit, tool kit": 477,
1278
+ "carton": 478,
1279
+ "cash machine, cash dispenser, automated teller machine, automatic teller machine, automated teller, automatic teller, ATM": 480,
1280
+ "cassette": 481,
1281
+ "cassette player": 482,
1282
+ "castle": 483,
1283
+ "catamaran": 484,
1284
+ "cauliflower": 938,
1285
+ "cello, violoncello": 486,
1286
+ "cellular telephone, cellular phone, cellphone, cell, mobile phone": 487,
1287
+ "centipede": 79,
1288
+ "chain": 488,
1289
+ "chain mail, ring mail, mail, chain armor, chain armour, ring armor, ring armour": 490,
1290
+ "chain saw, chainsaw": 491,
1291
+ "chainlink fence": 489,
1292
+ "chambered nautilus, pearly nautilus, nautilus": 117,
1293
+ "cheeseburger": 933,
1294
+ "cheetah, chetah, Acinonyx jubatus": 293,
1295
+ "chest": 492,
1296
+ "chickadee": 19,
1297
+ "chiffonier, commode": 493,
1298
+ "chime, bell, gong": 494,
1299
+ "chimpanzee, chimp, Pan troglodytes": 367,
1300
+ "china cabinet, china closet": 495,
1301
+ "chiton, coat-of-mail shell, sea cradle, polyplacophore": 116,
1302
+ "chocolate sauce, chocolate syrup": 960,
1303
+ "chow, chow chow": 260,
1304
+ "church, church building": 497,
1305
+ "cicada, cicala": 316,
1306
+ "cinema, movie theater, movie theatre, movie house, picture palace": 498,
1307
+ "cleaver, meat cleaver, chopper": 499,
1308
+ "cliff dwelling": 500,
1309
+ "cliff, drop, drop-off": 972,
1310
+ "cloak": 501,
1311
+ "clog, geta, patten, sabot": 502,
1312
+ "clumber, clumber spaniel": 216,
1313
+ "cock": 7,
1314
+ "cocker spaniel, English cocker spaniel, cocker": 219,
1315
+ "cockroach, roach": 314,
1316
+ "cocktail shaker": 503,
1317
+ "coffee mug": 504,
1318
+ "coffeepot": 505,
1319
+ "coho, cohoe, coho salmon, blue jack, silver salmon, Oncorhynchus kisutch": 391,
1320
+ "coil, spiral, volute, whorl, helix": 506,
1321
+ "collie": 231,
1322
+ "colobus, colobus monkey": 375,
1323
+ "combination lock": 507,
1324
+ "comic book": 917,
1325
+ "common iguana, iguana, Iguana iguana": 39,
1326
+ "common newt, Triturus vulgaris": 26,
1327
+ "computer keyboard, keypad": 508,
1328
+ "conch": 112,
1329
+ "confectionery, confectionary, candy store": 509,
1330
+ "consomme": 925,
1331
+ "container ship, containership, container vessel": 510,
1332
+ "convertible": 511,
1333
+ "coral fungus": 991,
1334
+ "coral reef": 973,
1335
+ "corkscrew, bottle screw": 512,
1336
+ "corn": 987,
1337
+ "cornet, horn, trumpet, trump": 513,
1338
+ "coucal": 91,
1339
+ "cougar, puma, catamount, mountain lion, painter, panther, Felis concolor": 286,
1340
+ "cowboy boot": 514,
1341
+ "cowboy hat, ten-gallon hat": 515,
1342
+ "coyote, prairie wolf, brush wolf, Canis latrans": 272,
1343
+ "cradle": 516,
1344
+ "crane": 517,
1345
+ "crash helmet": 518,
1346
+ "crate": 519,
1347
+ "crayfish, crawfish, crawdad, crawdaddy": 124,
1348
+ "crib, cot": 520,
1349
+ "cricket": 312,
1350
+ "croquet ball": 522,
1351
+ "crossword puzzle, crossword": 918,
1352
+ "crutch": 523,
1353
+ "cucumber, cuke": 943,
1354
+ "cuirass": 524,
1355
+ "cup": 968,
1356
+ "curly-coated retriever": 206,
1357
+ "custard apple": 956,
1358
+ "daisy": 985,
1359
+ "dalmatian, coach dog, carriage dog": 251,
1360
+ "dam, dike, dyke": 525,
1361
+ "damselfly": 320,
1362
+ "desk": 526,
1363
+ "desktop computer": 527,
1364
+ "dhole, Cuon alpinus": 274,
1365
+ "dial telephone, dial phone": 528,
1366
+ "diamondback, diamondback rattlesnake, Crotalus adamanteus": 67,
1367
+ "diaper, nappy, napkin": 529,
1368
+ "digital clock": 530,
1369
+ "digital watch": 531,
1370
+ "dingo, warrigal, warragal, Canis dingo": 273,
1371
+ "dining table, board": 532,
1372
+ "dishrag, dishcloth": 533,
1373
+ "dishwasher, dish washer, dishwashing machine": 534,
1374
+ "disk brake, disc brake": 535,
1375
+ "dock, dockage, docking facility": 536,
1376
+ "dogsled, dog sled, dog sleigh": 537,
1377
+ "dome": 538,
1378
+ "doormat, welcome mat": 539,
1379
+ "dough": 961,
1380
+ "dowitcher": 142,
1381
+ "dragonfly, darning needle, devil's darning needle, sewing needle, snake feeder, snake doctor, mosquito hawk, skeeter hawk": 319,
1382
+ "drake": 97,
1383
+ "drilling platform, offshore rig": 540,
1384
+ "drum, membranophone, tympan": 541,
1385
+ "drumstick": 542,
1386
+ "dugong, Dugong dugon": 149,
1387
+ "dumbbell": 543,
1388
+ "dung beetle": 305,
1389
+ "ear, spike, capitulum": 998,
1390
+ "earthstar": 995,
1391
+ "echidna, spiny anteater, anteater": 102,
1392
+ "eel": 390,
1393
+ "eft": 27,
1394
+ "eggnog": 969,
1395
+ "electric fan, blower": 545,
1396
+ "electric guitar": 546,
1397
+ "electric locomotive": 547,
1398
+ "electric ray, crampfish, numbfish, torpedo": 5,
1399
+ "entertainment center": 548,
1400
+ "envelope": 549,
1401
+ "espresso": 967,
1402
+ "espresso maker": 550,
1403
+ "face powder": 551,
1404
+ "feather boa, boa": 552,
1405
+ "fiddler crab": 120,
1406
+ "fig": 952,
1407
+ "file, file cabinet, filing cabinet": 553,
1408
+ "fire engine, fire truck": 555,
1409
+ "fire screen, fireguard": 556,
1410
+ "fireboat": 554,
1411
+ "flagpole, flagstaff": 557,
1412
+ "flamingo": 130,
1413
+ "flat-coated retriever": 205,
1414
+ "flatworm, platyhelminth": 110,
1415
+ "flute, transverse flute": 558,
1416
+ "fly": 308,
1417
+ "folding chair": 559,
1418
+ "football helmet": 560,
1419
+ "forklift": 561,
1420
+ "fountain": 562,
1421
+ "fountain pen": 563,
1422
+ "four-poster": 564,
1423
+ "fox squirrel, eastern fox squirrel, Sciurus niger": 335,
1424
+ "freight car": 565,
1425
+ "frilled lizard, Chlamydosaurus kingi": 43,
1426
+ "frying pan, frypan, skillet": 567,
1427
+ "fur coat": 568,
1428
+ "gar, garfish, garpike, billfish, Lepisosteus osseus": 395,
1429
+ "garbage truck, dustcart": 569,
1430
+ "garden spider, Aranea diademata": 74,
1431
+ "garter snake, grass snake": 57,
1432
+ "gas pump, gasoline pump, petrol pump, island dispenser": 571,
1433
+ "gasmask, respirator, gas helmet": 570,
1434
+ "gazelle": 353,
1435
+ "geyser": 974,
1436
+ "giant panda, panda, panda bear, coon bear, Ailuropoda melanoleuca": 388,
1437
+ "giant schnauzer": 197,
1438
+ "gibbon, Hylobates lar": 368,
1439
+ "go-kart": 573,
1440
+ "goblet": 572,
1441
+ "golden retriever": 207,
1442
+ "goldfinch, Carduelis carduelis": 11,
1443
+ "goldfish, Carassius auratus": 1,
1444
+ "golf ball": 574,
1445
+ "golfcart, golf cart": 575,
1446
+ "gondola": 576,
1447
+ "gong, tam-tam": 577,
1448
+ "goose": 99,
1449
+ "gorilla, Gorilla gorilla": 366,
1450
+ "gown": 578,
1451
+ "grand piano, grand": 579,
1452
+ "grasshopper, hopper": 311,
1453
+ "great grey owl, great gray owl, Strix nebulosa": 24,
1454
+ "great white shark, white shark, man-eater, man-eating shark, Carcharodon carcharias": 2,
1455
+ "green lizard, Lacerta viridis": 46,
1456
+ "green mamba": 64,
1457
+ "green snake, grass snake": 55,
1458
+ "greenhouse, nursery, glasshouse": 580,
1459
+ "grey fox, gray fox, Urocyon cinereoargenteus": 280,
1460
+ "grey whale, gray whale, devilfish, Eschrichtius gibbosus, Eschrichtius robustus": 147,
1461
+ "grille, radiator grille": 581,
1462
+ "grocery store, grocery, food market, market": 582,
1463
+ "groenendael": 224,
1464
+ "groom, bridegroom": 982,
1465
+ "ground beetle, carabid beetle": 302,
1466
+ "guacamole": 924,
1467
+ "guenon, guenon monkey": 370,
1468
+ "guillotine": 583,
1469
+ "guinea pig, Cavia cobaya": 338,
1470
+ "gyromitra": 993,
1471
+ "hair slide": 584,
1472
+ "hair spray": 585,
1473
+ "half track": 586,
1474
+ "hammer": 587,
1475
+ "hammerhead, hammerhead shark": 4,
1476
+ "hamper": 588,
1477
+ "hamster": 333,
1478
+ "hand blower, blow dryer, blow drier, hair dryer, hair drier": 589,
1479
+ "hand-held computer, hand-held microcomputer": 590,
1480
+ "handkerchief, hankie, hanky, hankey": 591,
1481
+ "hard disc, hard disk, fixed disk": 592,
1482
+ "hare": 331,
1483
+ "harmonica, mouth organ, harp, mouth harp": 593,
1484
+ "harp": 594,
1485
+ "hartebeest": 351,
1486
+ "harvester, reaper": 595,
1487
+ "harvestman, daddy longlegs, Phalangium opilio": 70,
1488
+ "hatchet": 596,
1489
+ "hay": 958,
1490
+ "head cabbage": 936,
1491
+ "hen": 8,
1492
+ "hen-of-the-woods, hen of the woods, Polyporus frondosus, Grifola frondosa": 996,
1493
+ "hermit crab": 125,
1494
+ "hip, rose hip, rosehip": 989,
1495
+ "hippopotamus, hippo, river horse, Hippopotamus amphibius": 344,
1496
+ "hog, pig, grunter, squealer, Sus scrofa": 341,
1497
+ "hognose snake, puff adder, sand viper": 54,
1498
+ "holster": 597,
1499
+ "home theater, home theatre": 598,
1500
+ "honeycomb": 599,
1501
+ "hook, claw": 600,
1502
+ "hoopskirt, crinoline": 601,
1503
+ "horizontal bar, high bar": 602,
1504
+ "hornbill": 93,
1505
+ "horned viper, cerastes, sand viper, horned asp, Cerastes cornutus": 66,
1506
+ "horse cart, horse-cart": 603,
1507
+ "hot pot, hotpot": 926,
1508
+ "hotdog, hot dog, red hot": 934,
1509
+ "hourglass": 604,
1510
+ "house finch, linnet, Carpodacus mexicanus": 12,
1511
+ "howler monkey, howler": 379,
1512
+ "hummingbird": 94,
1513
+ "hyena, hyaena": 276,
1514
+ "iPod": 605,
1515
+ "ibex, Capra ibex": 350,
1516
+ "ice bear, polar bear, Ursus Maritimus, Thalarctos maritimus": 296,
1517
+ "ice cream, icecream": 928,
1518
+ "ice lolly, lolly, lollipop, popsicle": 929,
1519
+ "impala, Aepyceros melampus": 352,
1520
+ "indigo bunting, indigo finch, indigo bird, Passerina cyanea": 14,
1521
+ "indri, indris, Indri indri, Indri brevicaudatus": 384,
1522
+ "iron, smoothing iron": 606,
1523
+ "isopod": 126,
1524
+ "jacamar": 95,
1525
+ "jack-o'-lantern": 607,
1526
+ "jackfruit, jak, jack": 955,
1527
+ "jaguar, panther, Panthera onca, Felis onca": 290,
1528
+ "jay": 17,
1529
+ "jean, blue jean, denim": 608,
1530
+ "jeep, landrover": 609,
1531
+ "jellyfish": 107,
1532
+ "jersey, T-shirt, tee shirt": 610,
1533
+ "jigsaw puzzle": 611,
1534
+ "jinrikisha, ricksha, rickshaw": 612,
1535
+ "joystick": 613,
1536
+ "junco, snowbird": 13,
1537
+ "keeshond": 261,
1538
+ "kelpie": 227,
1539
+ "killer whale, killer, orca, grampus, sea wolf, Orcinus orca": 148,
1540
+ "kimono": 614,
1541
+ "king crab, Alaska crab, Alaskan king crab, Alaska king crab, Paralithodes camtschatica": 121,
1542
+ "king penguin, Aptenodytes patagonica": 145,
1543
+ "king snake, kingsnake": 56,
1544
+ "kit fox, Vulpes macrotis": 278,
1545
+ "kite": 21,
1546
+ "knee pad": 615,
1547
+ "knot": 616,
1548
+ "koala, koala bear, kangaroo bear, native bear, Phascolarctos cinereus": 105,
1549
+ "komondor": 228,
1550
+ "kuvasz": 222,
1551
+ "lab coat, laboratory coat": 617,
1552
+ "lacewing, lacewing fly": 318,
1553
+ "ladle": 618,
1554
+ "ladybug, ladybeetle, lady beetle, ladybird, ladybird beetle": 301,
1555
+ "lakeside, lakeshore": 975,
1556
+ "lampshade, lamp shade": 619,
1557
+ "langur": 374,
1558
+ "laptop, laptop computer": 620,
1559
+ "lawn mower, mower": 621,
1560
+ "leaf beetle, chrysomelid": 304,
1561
+ "leafhopper": 317,
1562
+ "leatherback turtle, leatherback, leathery turtle, Dermochelys coriacea": 34,
1563
+ "lemon": 951,
1564
+ "lens cap, lens cover": 622,
1565
+ "leopard, Panthera pardus": 288,
1566
+ "lesser panda, red panda, panda, bear cat, cat bear, Ailurus fulgens": 387,
1567
+ "letter opener, paper knife, paperknife": 623,
1568
+ "library": 624,
1569
+ "lifeboat": 625,
1570
+ "lighter, light, igniter, ignitor": 626,
1571
+ "limousine, limo": 627,
1572
+ "limpkin, Aramus pictus": 135,
1573
+ "liner, ocean liner": 628,
1574
+ "lion, king of beasts, Panthera leo": 291,
1575
+ "lionfish": 396,
1576
+ "lipstick, lip rouge": 629,
1577
+ "little blue heron, Egretta caerulea": 131,
1578
+ "llama": 355,
1579
+ "loggerhead, loggerhead turtle, Caretta caretta": 33,
1580
+ "long-horned beetle, longicorn, longicorn beetle": 303,
1581
+ "lorikeet": 90,
1582
+ "lotion": 631,
1583
+ "loudspeaker, speaker, speaker unit, loudspeaker system, speaker system": 632,
1584
+ "loupe, jeweler's loupe": 633,
1585
+ "lumbermill, sawmill": 634,
1586
+ "lycaenid, lycaenid butterfly": 326,
1587
+ "lynx, catamount": 287,
1588
+ "macaque": 373,
1589
+ "macaw": 88,
1590
+ "magnetic compass": 635,
1591
+ "magpie": 18,
1592
+ "mailbag, postbag": 636,
1593
+ "mailbox, letter box": 637,
1594
+ "maillot": 638,
1595
+ "maillot, tank suit": 639,
1596
+ "malamute, malemute, Alaskan malamute": 249,
1597
+ "malinois": 225,
1598
+ "manhole cover": 640,
1599
+ "mantis, mantid": 315,
1600
+ "maraca": 641,
1601
+ "marimba, xylophone": 642,
1602
+ "marmoset": 377,
1603
+ "marmot": 336,
1604
+ "mashed potato": 935,
1605
+ "mask": 643,
1606
+ "matchstick": 644,
1607
+ "maypole": 645,
1608
+ "maze, labyrinth": 646,
1609
+ "measuring cup": 647,
1610
+ "meat loaf, meatloaf": 962,
1611
+ "medicine chest, medicine cabinet": 648,
1612
+ "meerkat, mierkat": 299,
1613
+ "megalith, megalithic structure": 649,
1614
+ "menu": 922,
1615
+ "microphone, mike": 650,
1616
+ "microwave, microwave oven": 651,
1617
+ "military uniform": 652,
1618
+ "milk can": 653,
1619
+ "miniature pinscher": 237,
1620
+ "miniature poodle": 266,
1621
+ "miniature schnauzer": 196,
1622
+ "minibus": 654,
1623
+ "miniskirt, mini": 655,
1624
+ "minivan": 656,
1625
+ "mink": 357,
1626
+ "missile": 657,
1627
+ "mitten": 658,
1628
+ "mixing bowl": 659,
1629
+ "mobile home, manufactured home": 660,
1630
+ "modem": 662,
1631
+ "monarch, monarch butterfly, milkweed butterfly, Danaus plexippus": 323,
1632
+ "monastery": 663,
1633
+ "mongoose": 298,
1634
+ "monitor": 664,
1635
+ "moped": 665,
1636
+ "mortar": 666,
1637
+ "mortarboard": 667,
1638
+ "mosque": 668,
1639
+ "mosquito net": 669,
1640
+ "motor scooter, scooter": 670,
1641
+ "mountain bike, all-terrain bike, off-roader": 671,
1642
+ "mountain tent": 672,
1643
+ "mouse, computer mouse": 673,
1644
+ "mousetrap": 674,
1645
+ "moving van": 675,
1646
+ "mud turtle": 35,
1647
+ "mushroom": 947,
1648
+ "muzzle": 676,
1649
+ "nail": 677,
1650
+ "neck brace": 678,
1651
+ "necklace": 679,
1652
+ "nematode, nematode worm, roundworm": 111,
1653
+ "night snake, Hypsiglena torquata": 60,
1654
+ "nipple": 680,
1655
+ "notebook, notebook computer": 681,
1656
+ "obelisk": 682,
1657
+ "oboe, hautboy, hautbois": 683,
1658
+ "ocarina, sweet potato": 684,
1659
+ "odometer, hodometer, mileometer, milometer": 685,
1660
+ "oil filter": 686,
1661
+ "orange": 950,
1662
+ "orangutan, orang, orangutang, Pongo pygmaeus": 365,
1663
+ "organ, pipe organ": 687,
1664
+ "oscilloscope, scope, cathode-ray oscilloscope, CRO": 688,
1665
+ "ostrich, Struthio camelus": 9,
1666
+ "otter": 360,
1667
+ "otterhound, otter hound": 175,
1668
+ "overskirt": 689,
1669
+ "ox": 345,
1670
+ "oxcart": 690,
1671
+ "oxygen mask": 691,
1672
+ "oystercatcher, oyster catcher": 143,
1673
+ "packet": 692,
1674
+ "paddle, boat paddle": 693,
1675
+ "paddlewheel, paddle wheel": 694,
1676
+ "padlock": 695,
1677
+ "paintbrush": 696,
1678
+ "pajama, pyjama, pj's, jammies": 697,
1679
+ "palace": 698,
1680
+ "panpipe, pandean pipe, syrinx": 699,
1681
+ "paper towel": 700,
1682
+ "papillon": 157,
1683
+ "parachute, chute": 701,
1684
+ "parallel bars, bars": 702,
1685
+ "park bench": 703,
1686
+ "parking meter": 704,
1687
+ "partridge": 86,
1688
+ "passenger car, coach, carriage": 705,
1689
+ "patas, hussar monkey, Erythrocebus patas": 371,
1690
+ "patio, terrace": 706,
1691
+ "pay-phone, pay-station": 707,
1692
+ "peacock": 84,
1693
+ "pedestal, plinth, footstall": 708,
1694
+ "pelican": 144,
1695
+ "pencil box, pencil case": 709,
1696
+ "pencil sharpener": 710,
1697
+ "perfume, essence": 711,
1698
+ "photocopier": 713,
1699
+ "pick, plectrum, plectron": 714,
1700
+ "pickelhaube": 715,
1701
+ "picket fence, paling": 716,
1702
+ "pickup, pickup truck": 717,
1703
+ "pier": 718,
1704
+ "piggy bank, penny bank": 719,
1705
+ "pill bottle": 720,
1706
+ "pillow": 721,
1707
+ "pineapple, ananas": 953,
1708
+ "ping-pong ball": 722,
1709
+ "pinwheel": 723,
1710
+ "pirate, pirate ship": 724,
1711
+ "pitcher, ewer": 725,
1712
+ "pizza, pizza pie": 963,
1713
+ "plane, carpenter's plane, woodworking plane": 726,
1714
+ "planetarium": 727,
1715
+ "plastic bag": 728,
1716
+ "plate": 923,
1717
+ "plate rack": 729,
1718
+ "platypus, duckbill, duckbilled platypus, duck-billed platypus, Ornithorhynchus anatinus": 103,
1719
+ "plow, plough": 730,
1720
+ "plunger, plumber's helper": 731,
1721
+ "pole": 733,
1722
+ "polecat, fitch, foulmart, foumart, Mustela putorius": 358,
1723
+ "police van, police wagon, paddy wagon, patrol wagon, wagon, black Maria": 734,
1724
+ "pomegranate": 957,
1725
+ "poncho": 735,
1726
+ "pool table, billiard table, snooker table": 736,
1727
+ "pop bottle, soda bottle": 737,
1728
+ "porcupine, hedgehog": 334,
1729
+ "pot, flowerpot": 738,
1730
+ "potpie": 964,
1731
+ "potter's wheel": 739,
1732
+ "power drill": 740,
1733
+ "prairie chicken, prairie grouse, prairie fowl": 83,
1734
+ "prayer rug, prayer mat": 741,
1735
+ "pretzel": 932,
1736
+ "printer": 742,
1737
+ "prison, prison house": 743,
1738
+ "proboscis monkey, Nasalis larvatus": 376,
1739
+ "projectile, missile": 744,
1740
+ "projector": 745,
1741
+ "promontory, headland, head, foreland": 976,
1742
+ "ptarmigan": 81,
1743
+ "puck, hockey puck": 746,
1744
+ "puffer, pufferfish, blowfish, globefish": 397,
1745
+ "pug, pug-dog": 254,
1746
+ "punching bag, punch bag, punching ball, punchball": 747,
1747
+ "purse": 748,
1748
+ "quail": 85,
1749
+ "quill, quill pen": 749,
1750
+ "quilt, comforter, comfort, puff": 750,
1751
+ "racer, race car, racing car": 751,
1752
+ "racket, racquet": 752,
1753
+ "radiator": 753,
1754
+ "radio telescope, radio reflector": 755,
1755
+ "radio, wireless": 754,
1756
+ "rain barrel": 756,
1757
+ "ram, tup": 348,
1758
+ "rapeseed": 984,
1759
+ "recreational vehicle, RV, R.V.": 757,
1760
+ "red fox, Vulpes vulpes": 277,
1761
+ "red wine": 966,
1762
+ "red wolf, maned wolf, Canis rufus, Canis niger": 271,
1763
+ "red-backed sandpiper, dunlin, Erolia alpina": 140,
1764
+ "red-breasted merganser, Mergus serrator": 98,
1765
+ "redbone": 168,
1766
+ "redshank, Tringa totanus": 141,
1767
+ "reel": 758,
1768
+ "reflex camera": 759,
1769
+ "refrigerator, icebox": 760,
1770
+ "remote control, remote": 761,
1771
+ "restaurant, eating house, eating place, eatery": 762,
1772
+ "revolver, six-gun, six-shooter": 763,
1773
+ "rhinoceros beetle": 306,
1774
+ "rifle": 764,
1775
+ "ringlet, ringlet butterfly": 322,
1776
+ "ringneck snake, ring-necked snake, ring snake": 53,
1777
+ "robin, American robin, Turdus migratorius": 15,
1778
+ "rock beauty, Holocanthus tricolor": 392,
1779
+ "rock crab, Cancer irroratus": 119,
1780
+ "rock python, rock snake, Python sebae": 62,
1781
+ "rocking chair, rocker": 765,
1782
+ "rotisserie": 766,
1783
+ "rubber eraser, rubber, pencil eraser": 767,
1784
+ "ruddy turnstone, Arenaria interpres": 139,
1785
+ "ruffed grouse, partridge, Bonasa umbellus": 82,
1786
+ "rugby ball": 768,
1787
+ "rule, ruler": 769,
1788
+ "running shoe": 770,
1789
+ "safe": 771,
1790
+ "safety pin": 772,
1791
+ "saltshaker, salt shaker": 773,
1792
+ "sandal": 774,
1793
+ "sandbar, sand bar": 977,
1794
+ "sarong": 775,
1795
+ "sax, saxophone": 776,
1796
+ "scabbard": 777,
1797
+ "scale, weighing machine": 778,
1798
+ "schipperke": 223,
1799
+ "school bus": 779,
1800
+ "schooner": 780,
1801
+ "scoreboard": 781,
1802
+ "scorpion": 71,
1803
+ "screen, CRT screen": 782,
1804
+ "screw": 783,
1805
+ "screwdriver": 784,
1806
+ "scuba diver": 983,
1807
+ "sea anemone, anemone": 108,
1808
+ "sea cucumber, holothurian": 329,
1809
+ "sea lion": 150,
1810
+ "sea slug, nudibranch": 115,
1811
+ "sea snake": 65,
1812
+ "sea urchin": 328,
1813
+ "seashore, coast, seacoast, sea-coast": 978,
1814
+ "seat belt, seatbelt": 785,
1815
+ "sewing machine": 786,
1816
+ "shield, buckler": 787,
1817
+ "shoe shop, shoe-shop, shoe store": 788,
1818
+ "shoji": 789,
1819
+ "shopping basket": 790,
1820
+ "shopping cart": 791,
1821
+ "shovel": 792,
1822
+ "shower cap": 793,
1823
+ "shower curtain": 794,
1824
+ "siamang, Hylobates syndactylus, Symphalangus syndactylus": 369,
1825
+ "sidewinder, horned rattlesnake, Crotalus cerastes": 68,
1826
+ "silky terrier, Sydney silky": 201,
1827
+ "ski": 795,
1828
+ "ski mask": 796,
1829
+ "skunk, polecat, wood pussy": 361,
1830
+ "sleeping bag": 797,
1831
+ "slide rule, slipstick": 798,
1832
+ "sliding door": 799,
1833
+ "slot, one-armed bandit": 800,
1834
+ "sloth bear, Melursus ursinus, Ursus ursinus": 297,
1835
+ "slug": 114,
1836
+ "snail": 113,
1837
+ "snorkel": 801,
1838
+ "snow leopard, ounce, Panthera uncia": 289,
1839
+ "snowmobile": 802,
1840
+ "snowplow, snowplough": 803,
1841
+ "soap dispenser": 804,
1842
+ "soccer ball": 805,
1843
+ "sock": 806,
1844
+ "soft-coated wheaten terrier": 202,
1845
+ "solar dish, solar collector, solar furnace": 807,
1846
+ "sombrero": 808,
1847
+ "sorrel": 339,
1848
+ "soup bowl": 809,
1849
+ "space bar": 810,
1850
+ "space heater": 811,
1851
+ "space shuttle": 812,
1852
+ "spaghetti squash": 940,
1853
+ "spatula": 813,
1854
+ "speedboat": 814,
1855
+ "spider monkey, Ateles geoffroyi": 381,
1856
+ "spider web, spider's web": 815,
1857
+ "spindle": 816,
1858
+ "spiny lobster, langouste, rock lobster, crawfish, crayfish, sea crawfish": 123,
1859
+ "spoonbill": 129,
1860
+ "sports car, sport car": 817,
1861
+ "spotlight, spot": 818,
1862
+ "spotted salamander, Ambystoma maculatum": 28,
1863
+ "squirrel monkey, Saimiri sciureus": 382,
1864
+ "stage": 819,
1865
+ "standard poodle": 267,
1866
+ "standard schnauzer": 198,
1867
+ "starfish, sea star": 327,
1868
+ "steam locomotive": 820,
1869
+ "steel arch bridge": 821,
1870
+ "steel drum": 822,
1871
+ "stethoscope": 823,
1872
+ "stingray": 6,
1873
+ "stinkhorn, carrion fungus": 994,
1874
+ "stole": 824,
1875
+ "stone wall": 825,
1876
+ "stopwatch, stop watch": 826,
1877
+ "stove": 827,
1878
+ "strainer": 828,
1879
+ "strawberry": 949,
1880
+ "street sign": 919,
1881
+ "streetcar, tram, tramcar, trolley, trolley car": 829,
1882
+ "stretcher": 830,
1883
+ "studio couch, day bed": 831,
1884
+ "stupa, tope": 832,
1885
+ "sturgeon": 394,
1886
+ "submarine, pigboat, sub, U-boat": 833,
1887
+ "suit, suit of clothes": 834,
1888
+ "sulphur butterfly, sulfur butterfly": 325,
1889
+ "sulphur-crested cockatoo, Kakatoe galerita, Cacatua galerita": 89,
1890
+ "sundial": 835,
1891
+ "sunglass": 836,
1892
+ "sunglasses, dark glasses, shades": 837,
1893
+ "sunscreen, sunblock, sun blocker": 838,
1894
+ "suspension bridge": 839,
1895
+ "swab, swob, mop": 840,
1896
+ "sweatshirt": 841,
1897
+ "swimming trunks, bathing trunks": 842,
1898
+ "swing": 843,
1899
+ "switch, electric switch, electrical switch": 844,
1900
+ "syringe": 845,
1901
+ "tabby, tabby cat": 281,
1902
+ "table lamp": 846,
1903
+ "tailed frog, bell toad, ribbed toad, tailed toad, Ascaphus trui": 32,
1904
+ "tank, army tank, armored combat vehicle, armoured combat vehicle": 847,
1905
+ "tape player": 848,
1906
+ "tarantula": 76,
1907
+ "teapot": 849,
1908
+ "teddy, teddy bear": 850,
1909
+ "television, television system": 851,
1910
+ "tench, Tinca tinca": 0,
1911
+ "tennis ball": 852,
1912
+ "terrapin": 36,
1913
+ "thatch, thatched roof": 853,
1914
+ "theater curtain, theatre curtain": 854,
1915
+ "thimble": 855,
1916
+ "three-toed sloth, ai, Bradypus tridactylus": 364,
1917
+ "thresher, thrasher, threshing machine": 856,
1918
+ "throne": 857,
1919
+ "thunder snake, worm snake, Carphophis amoenus": 52,
1920
+ "tick": 78,
1921
+ "tiger beetle": 300,
1922
+ "tiger cat": 282,
1923
+ "tiger shark, Galeocerdo cuvieri": 3,
1924
+ "tiger, Panthera tigris": 292,
1925
+ "tile roof": 858,
1926
+ "timber wolf, grey wolf, gray wolf, Canis lupus": 269,
1927
+ "titi, titi monkey": 380,
1928
+ "toaster": 859,
1929
+ "tobacco shop, tobacconist shop, tobacconist": 860,
1930
+ "toilet seat": 861,
1931
+ "toilet tissue, toilet paper, bathroom tissue": 999,
1932
+ "torch": 862,
1933
+ "totem pole": 863,
1934
+ "toucan": 96,
1935
+ "tow truck, tow car, wrecker": 864,
1936
+ "toy poodle": 265,
1937
+ "toy terrier": 158,
1938
+ "toyshop": 865,
1939
+ "tractor": 866,
1940
+ "traffic light, traffic signal, stoplight": 920,
1941
+ "trailer truck, tractor trailer, trucking rig, rig, articulated lorry, semi": 867,
1942
+ "tray": 868,
1943
+ "tree frog, tree-frog": 31,
1944
+ "trench coat": 869,
1945
+ "triceratops": 51,
1946
+ "tricycle, trike, velocipede": 870,
1947
+ "trifle": 927,
1948
+ "trilobite": 69,
1949
+ "trimaran": 871,
1950
+ "tripod": 872,
1951
+ "triumphal arch": 873,
1952
+ "trolleybus, trolley coach, trackless trolley": 874,
1953
+ "trombone": 875,
1954
+ "tub, vat": 876,
1955
+ "turnstile": 877,
1956
+ "tusker": 101,
1957
+ "typewriter keyboard": 878,
1958
+ "umbrella": 879,
1959
+ "unicycle, monocycle": 880,
1960
+ "upright, upright piano": 881,
1961
+ "vacuum, vacuum cleaner": 882,
1962
+ "valley, vale": 979,
1963
+ "vase": 883,
1964
+ "vault": 884,
1965
+ "velvet": 885,
1966
+ "vending machine": 886,
1967
+ "vestment": 887,
1968
+ "viaduct": 888,
1969
+ "vine snake": 59,
1970
+ "violin, fiddle": 889,
1971
+ "vizsla, Hungarian pointer": 211,
1972
+ "volcano": 980,
1973
+ "volleyball": 890,
1974
+ "vulture": 23,
1975
+ "waffle iron": 891,
1976
+ "walking stick, walkingstick, stick insect": 313,
1977
+ "wall clock": 892,
1978
+ "wallaby, brush kangaroo": 104,
1979
+ "wallet, billfold, notecase, pocketbook": 893,
1980
+ "wardrobe, closet, press": 894,
1981
+ "warplane, military plane": 895,
1982
+ "warthog": 343,
1983
+ "washbasin, handbasin, washbowl, lavabo, wash-hand basin": 896,
1984
+ "washer, automatic washer, washing machine": 897,
1985
+ "water bottle": 898,
1986
+ "water buffalo, water ox, Asiatic buffalo, Bubalus bubalis": 346,
1987
+ "water jug": 899,
1988
+ "water ouzel, dipper": 20,
1989
+ "water snake": 58,
1990
+ "water tower": 900,
1991
+ "weasel": 356,
1992
+ "web site, website, internet site, site": 916,
1993
+ "weevil": 307,
1994
+ "whippet": 172,
1995
+ "whiptail, whiptail lizard": 41,
1996
+ "whiskey jug": 901,
1997
+ "whistle": 902,
1998
+ "white stork, Ciconia ciconia": 127,
1999
+ "white wolf, Arctic wolf, Canis lupus tundrarum": 270,
2000
+ "wig": 903,
2001
+ "wild boar, boar, Sus scrofa": 342,
2002
+ "window screen": 904,
2003
+ "window shade": 905,
2004
+ "wine bottle": 907,
2005
+ "wing": 908,
2006
+ "wire-haired fox terrier": 188,
2007
+ "wok": 909,
2008
+ "wolf spider, hunting spider": 77,
2009
+ "wombat": 106,
2010
+ "wood rabbit, cottontail, cottontail rabbit": 330,
2011
+ "wooden spoon": 910,
2012
+ "wool, woolen, woollen": 911,
2013
+ "worm fence, snake fence, snake-rail fence, Virginia fence": 912,
2014
+ "wreck": 913,
2015
+ "yawl": 914,
2016
+ "yellow lady's slipper, yellow lady-slipper, Cypripedium calceolus, Cypripedium parviflorum": 986,
2017
+ "yurt": 915,
2018
+ "zebra": 340,
2019
+ "zucchini, courgette": 939
2020
+ },
2021
+ "layer_norm_eps": 1e-12,
2022
+ "model_type": "vit",
2023
+ "num_attention_heads": 12,
2024
+ "num_channels": 3,
2025
+ "num_hidden_layers": 12,
2026
+ "patch_size": 16,
2027
+ "qkv_bias": true,
2028
+ "semantic_loss_ignore_index": 255,
2029
+ "torch_dtype": "float32",
2030
+ "transformers_version": "4.31.0.dev0"
2031
+ }
configuration_vit.py ADDED
@@ -0,0 +1,144 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # coding=utf-8
2
+ # Copyright 2021 Google AI and The HuggingFace Inc. team. All rights reserved.
3
+ #
4
+ # Licensed under the Apache License, Version 2.0 (the "License");
5
+ # you may not use this file except in compliance with the License.
6
+ # You may obtain a copy of the License at
7
+ #
8
+ # http://www.apache.org/licenses/LICENSE-2.0
9
+ #
10
+ # Unless required by applicable law or agreed to in writing, software
11
+ # distributed under the License is distributed on an "AS IS" BASIS,
12
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13
+ # See the License for the specific language governing permissions and
14
+ # limitations under the License.
15
+ """ ViT model configuration"""
16
+
17
+ from collections import OrderedDict
18
+ from typing import Mapping
19
+
20
+ from packaging import version
21
+
22
+ from transformers.configuration_utils import PretrainedConfig
23
+ from transformers.onnx import OnnxConfig
24
+ from transformers.utils import logging
25
+
26
+
27
+ logger = logging.get_logger(__name__)
28
+
29
+ VIT_PRETRAINED_CONFIG_ARCHIVE_MAP = {
30
+ "google/vit-base-patch16-224": "https://huggingface.co/vit-base-patch16-224/resolve/main/config.json",
31
+ # See all ViT models at https://huggingface.co/models?filter=vit
32
+ }
33
+
34
+
35
+ class ViTConfig(PretrainedConfig):
36
+ r"""
37
+ This is the configuration class to store the configuration of a [`ViTModel`]. It is used to instantiate an ViT
38
+ model according to the specified arguments, defining the model architecture. Instantiating a configuration with the
39
+ defaults will yield a similar configuration to that of the ViT
40
+ [google/vit-base-patch16-224](https://huggingface.co/google/vit-base-patch16-224) architecture.
41
+
42
+ Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the
43
+ documentation from [`PretrainedConfig`] for more information.
44
+
45
+
46
+ Args:
47
+ hidden_size (`int`, *optional*, defaults to 768):
48
+ Dimensionality of the encoder layers and the pooler layer.
49
+ num_hidden_layers (`int`, *optional*, defaults to 12):
50
+ Number of hidden layers in the Transformer encoder.
51
+ num_attention_heads (`int`, *optional*, defaults to 12):
52
+ Number of attention heads for each attention layer in the Transformer encoder.
53
+ intermediate_size (`int`, *optional*, defaults to 3072):
54
+ Dimensionality of the "intermediate" (i.e., feed-forward) layer in the Transformer encoder.
55
+ hidden_act (`str` or `function`, *optional*, defaults to `"gelu"`):
56
+ The non-linear activation function (function or string) in the encoder and pooler. If string, `"gelu"`,
57
+ `"relu"`, `"selu"` and `"gelu_new"` are supported.
58
+ hidden_dropout_prob (`float`, *optional*, defaults to 0.1):
59
+ The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
60
+ attention_probs_dropout_prob (`float`, *optional*, defaults to 0.1):
61
+ The dropout ratio for the attention probabilities.
62
+ initializer_range (`float`, *optional*, defaults to 0.02):
63
+ The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
64
+ layer_norm_eps (`float`, *optional*, defaults to 1e-12):
65
+ The epsilon used by the layer normalization layers.
66
+ image_size (`int`, *optional*, defaults to `224`):
67
+ The size (resolution) of each image.
68
+ patch_size (`int`, *optional*, defaults to `16`):
69
+ The size (resolution) of each patch.
70
+ num_channels (`int`, *optional*, defaults to `3`):
71
+ The number of input channels.
72
+ qkv_bias (`bool`, *optional*, defaults to `True`):
73
+ Whether to add a bias to the queries, keys and values.
74
+ encoder_stride (`int`, `optional`, defaults to 16):
75
+ Factor to increase the spatial resolution by in the decoder head for masked image modeling.
76
+
77
+ Example:
78
+
79
+ ```python
80
+ >>> from transformers import ViTConfig, ViTModel
81
+
82
+ >>> # Initializing a ViT vit-base-patch16-224 style configuration
83
+ >>> configuration = ViTConfig()
84
+
85
+ >>> # Initializing a model (with random weights) from the vit-base-patch16-224 style configuration
86
+ >>> model = ViTModel(configuration)
87
+
88
+ >>> # Accessing the model configuration
89
+ >>> configuration = model.config
90
+ ```"""
91
+ model_type = "vit"
92
+
93
+ def __init__(
94
+ self,
95
+ hidden_size=768,
96
+ num_hidden_layers=12,
97
+ num_attention_heads=12,
98
+ intermediate_size=3072,
99
+ hidden_act="gelu",
100
+ hidden_dropout_prob=0.0,
101
+ attention_probs_dropout_prob=0.0,
102
+ initializer_range=0.02,
103
+ layer_norm_eps=1e-12,
104
+ image_size=224,
105
+ patch_size=16,
106
+ num_channels=3,
107
+ qkv_bias=True,
108
+ encoder_stride=16,
109
+ semantic_loss_ignore_index=255,
110
+ **kwargs,
111
+ ):
112
+ super().__init__(**kwargs)
113
+
114
+ self.hidden_size = hidden_size
115
+ self.num_hidden_layers = num_hidden_layers
116
+ self.num_attention_heads = num_attention_heads
117
+ self.intermediate_size = intermediate_size
118
+ self.hidden_act = hidden_act
119
+ self.hidden_dropout_prob = hidden_dropout_prob
120
+ self.attention_probs_dropout_prob = attention_probs_dropout_prob
121
+ self.initializer_range = initializer_range
122
+ self.layer_norm_eps = layer_norm_eps
123
+ self.image_size = image_size
124
+ self.patch_size = patch_size
125
+ self.num_channels = num_channels
126
+ self.qkv_bias = qkv_bias
127
+ self.encoder_stride = encoder_stride
128
+ self.semantic_loss_ignore_index = semantic_loss_ignore_index
129
+
130
+
131
+ class ViTOnnxConfig(OnnxConfig):
132
+ torch_onnx_minimum_version = version.parse("1.11")
133
+
134
+ @property
135
+ def inputs(self) -> Mapping[str, Mapping[int, str]]:
136
+ return OrderedDict(
137
+ [
138
+ ("pixel_values", {0: "batch", 1: "num_channels", 2: "height", 3: "width"}),
139
+ ]
140
+ )
141
+
142
+ @property
143
+ def atol_for_validation(self) -> float:
144
+ return 1e-4
modeling_vit.py ADDED
@@ -0,0 +1,960 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # coding=utf-8
2
+ # Copyright 2021 Google AI, Ross Wightman, The HuggingFace Inc. team. All rights reserved.
3
+ #
4
+ # Licensed under the Apache License, Version 2.0 (the "License");
5
+ # you may not use this file except in compliance with the License.
6
+ # You may obtain a copy of the License at
7
+ #
8
+ # http://www.apache.org/licenses/LICENSE-2.0
9
+ #
10
+ # Unless required by applicable law or agreed to in writing, software
11
+ # distributed under the License is distributed on an "AS IS" BASIS,
12
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13
+ # See the License for the specific language governing permissions and
14
+ # limitations under the License.
15
+ """ PyTorch ViT model."""
16
+
17
+
18
+ import collections.abc
19
+ import math
20
+ from typing import Dict, List, Optional, Set, Tuple, Union
21
+
22
+ import torch
23
+ import torch.utils.checkpoint
24
+ from torch import nn
25
+ from torch.nn import BCEWithLogitsLoss, CrossEntropyLoss, MSELoss
26
+
27
+ from transformers.activations import ACT2FN
28
+ from transformers.modeling_outputs import (
29
+ BaseModelOutput,
30
+ BaseModelOutputWithPooling,
31
+ ImageClassifierOutput,
32
+ MaskedImageModelingOutput,
33
+ SemanticSegmenterOutput
34
+ )
35
+ from transformers.modeling_utils import PreTrainedModel
36
+ from transformers.pytorch_utils import find_pruneable_heads_and_indices, prune_linear_layer
37
+ from transformers.utils import (
38
+ add_code_sample_docstrings,
39
+ add_start_docstrings,
40
+ add_start_docstrings_to_model_forward,
41
+ logging,
42
+ replace_return_docstrings,
43
+ )
44
+ from .configuration_vit import ViTConfig
45
+
46
+
47
+ logger = logging.get_logger(__name__)
48
+
49
+ # General docstring
50
+ _CONFIG_FOR_DOC = "ViTConfig"
51
+
52
+ # Base docstring
53
+ _CHECKPOINT_FOR_DOC = "google/vit-base-patch16-224-in21k"
54
+ _EXPECTED_OUTPUT_SHAPE = [1, 197, 768]
55
+
56
+ # Image classification docstring
57
+ _IMAGE_CLASS_CHECKPOINT = "google/vit-base-patch16-224"
58
+ _IMAGE_CLASS_EXPECTED_OUTPUT = "Egyptian cat"
59
+
60
+
61
+ VIT_PRETRAINED_MODEL_ARCHIVE_LIST = [
62
+ "google/vit-base-patch16-224",
63
+ # See all ViT models at https://huggingface.co/models?filter=vit
64
+ ]
65
+
66
+
67
+ class ViTEmbeddings(nn.Module):
68
+ """
69
+ Construct the CLS token, position and patch embeddings. Optionally, also the mask token.
70
+ """
71
+
72
+ def __init__(self, config: ViTConfig, use_mask_token: bool = False) -> None:
73
+ super().__init__()
74
+
75
+ self.cls_token = nn.Parameter(torch.randn(1, 1, config.hidden_size))
76
+ self.mask_token = nn.Parameter(torch.zeros(1, 1, config.hidden_size)) if use_mask_token else None
77
+ self.patch_embeddings = ViTPatchEmbeddings(config)
78
+ num_patches = self.patch_embeddings.num_patches
79
+ self.position_embeddings = nn.Parameter(torch.randn(1, num_patches + 1, config.hidden_size))
80
+ self.dropout = nn.Dropout(config.hidden_dropout_prob)
81
+ self.config = config
82
+
83
+ def interpolate_pos_encoding(self, embeddings: torch.Tensor, height: int, width: int) -> torch.Tensor:
84
+ """
85
+ This method allows to interpolate the pre-trained position encodings, to be able to use the model on higher
86
+ resolution images.
87
+
88
+ Source:
89
+ https://github.com/facebookresearch/dino/blob/de9ee3df6cf39fac952ab558447af1fa1365362a/vision_transformer.py#L174
90
+ """
91
+
92
+ num_patches = embeddings.shape[1] - 1
93
+ num_positions = self.position_embeddings.shape[1] - 1
94
+ if num_patches == num_positions and height == width:
95
+ return self.position_embeddings
96
+ class_pos_embed = self.position_embeddings[:, 0]
97
+ patch_pos_embed = self.position_embeddings[:, 1:]
98
+ dim = embeddings.shape[-1]
99
+ h0 = height // self.config.patch_size
100
+ w0 = width // self.config.patch_size
101
+ # we add a small number to avoid floating point error in the interpolation
102
+ # see discussion at https://github.com/facebookresearch/dino/issues/8
103
+ h0, w0 = h0 + 0.1, w0 + 0.1
104
+ patch_pos_embed = patch_pos_embed.reshape(1, int(math.sqrt(num_positions)), int(math.sqrt(num_positions)), dim)
105
+ patch_pos_embed = patch_pos_embed.permute(0, 3, 1, 2)
106
+ patch_pos_embed = nn.functional.interpolate(
107
+ patch_pos_embed,
108
+ scale_factor=(h0 / math.sqrt(num_positions), w0 / math.sqrt(num_positions)),
109
+ mode="bicubic",
110
+ align_corners=False,
111
+ )
112
+ assert int(h0) == patch_pos_embed.shape[-2] and int(w0) == patch_pos_embed.shape[-1]
113
+ patch_pos_embed = patch_pos_embed.permute(0, 2, 3, 1).view(1, -1, dim)
114
+ return torch.cat((class_pos_embed.unsqueeze(0), patch_pos_embed), dim=1)
115
+
116
+ def forward(
117
+ self,
118
+ pixel_values: torch.Tensor,
119
+ bool_masked_pos: Optional[torch.BoolTensor] = None,
120
+ interpolate_pos_encoding: bool = False,
121
+ ) -> torch.Tensor:
122
+ batch_size, num_channels, height, width = pixel_values.shape
123
+ embeddings = self.patch_embeddings(pixel_values, interpolate_pos_encoding=interpolate_pos_encoding)
124
+
125
+ if bool_masked_pos is not None:
126
+ seq_length = embeddings.shape[1]
127
+ mask_tokens = self.mask_token.expand(batch_size, seq_length, -1)
128
+ # replace the masked visual tokens by mask_tokens
129
+ mask = bool_masked_pos.unsqueeze(-1).type_as(mask_tokens)
130
+ embeddings = embeddings * (1.0 - mask) + mask_tokens * mask
131
+
132
+ # add the [CLS] token to the embedded patch tokens
133
+ cls_tokens = self.cls_token.expand(batch_size, -1, -1)
134
+ embeddings = torch.cat((cls_tokens, embeddings), dim=1)
135
+
136
+ # add positional encoding to each token
137
+ if interpolate_pos_encoding:
138
+ embeddings = embeddings + self.interpolate_pos_encoding(embeddings, height, width)
139
+ else:
140
+ embeddings = embeddings + self.position_embeddings
141
+
142
+ embeddings = self.dropout(embeddings)
143
+
144
+ return embeddings
145
+
146
+
147
+ class ViTPatchEmbeddings(nn.Module):
148
+ """
149
+ This class turns `pixel_values` of shape `(batch_size, num_channels, height, width)` into the initial
150
+ `hidden_states` (patch embeddings) of shape `(batch_size, seq_length, hidden_size)` to be consumed by a
151
+ Transformer.
152
+ """
153
+
154
+ def __init__(self, config):
155
+ super().__init__()
156
+ image_size, patch_size = config.image_size, config.patch_size
157
+ num_channels, hidden_size = config.num_channels, config.hidden_size
158
+
159
+ image_size = image_size if isinstance(image_size, collections.abc.Iterable) else (image_size, image_size)
160
+ patch_size = patch_size if isinstance(patch_size, collections.abc.Iterable) else (patch_size, patch_size)
161
+ num_patches = (image_size[1] // patch_size[1]) * (image_size[0] // patch_size[0])
162
+ self.image_size = image_size
163
+ self.patch_size = patch_size
164
+ self.num_channels = num_channels
165
+ self.num_patches = num_patches
166
+
167
+ self.projection = nn.Conv2d(num_channels, hidden_size, kernel_size=patch_size, stride=patch_size)
168
+
169
+ def forward(self, pixel_values: torch.Tensor, interpolate_pos_encoding: bool = False) -> torch.Tensor:
170
+ batch_size, num_channels, height, width = pixel_values.shape
171
+ if num_channels != self.num_channels:
172
+ raise ValueError(
173
+ "Make sure that the channel dimension of the pixel values match with the one set in the configuration."
174
+ f" Expected {self.num_channels} but got {num_channels}."
175
+ )
176
+ if not interpolate_pos_encoding:
177
+ if height != self.image_size[0] or width != self.image_size[1]:
178
+ raise ValueError(
179
+ f"Input image size ({height}*{width}) doesn't match model"
180
+ f" ({self.image_size[0]}*{self.image_size[1]})."
181
+ )
182
+ embeddings = self.projection(pixel_values).flatten(2).transpose(1, 2)
183
+ return embeddings
184
+
185
+
186
+ class ViTSelfAttention(nn.Module):
187
+ def __init__(self, config: ViTConfig) -> None:
188
+ super().__init__()
189
+ if config.hidden_size % config.num_attention_heads != 0 and not hasattr(config, "embedding_size"):
190
+ raise ValueError(
191
+ f"The hidden size {config.hidden_size,} is not a multiple of the number of attention "
192
+ f"heads {config.num_attention_heads}."
193
+ )
194
+
195
+ self.num_attention_heads = config.num_attention_heads
196
+ self.attention_head_size = int(config.hidden_size / config.num_attention_heads)
197
+ self.all_head_size = self.num_attention_heads * self.attention_head_size
198
+
199
+ self.query = nn.Linear(config.hidden_size, self.all_head_size, bias=config.qkv_bias)
200
+ self.key = nn.Linear(config.hidden_size, self.all_head_size, bias=config.qkv_bias)
201
+ self.value = nn.Linear(config.hidden_size, self.all_head_size, bias=config.qkv_bias)
202
+
203
+ self.dropout = nn.Dropout(config.attention_probs_dropout_prob)
204
+
205
+ def transpose_for_scores(self, x: torch.Tensor) -> torch.Tensor:
206
+ new_x_shape = x.size()[:-1] + (self.num_attention_heads, self.attention_head_size)
207
+ x = x.view(new_x_shape)
208
+ return x.permute(0, 2, 1, 3)
209
+
210
+ def forward(
211
+ self, hidden_states, head_mask: Optional[torch.Tensor] = None, output_attentions: bool = False
212
+ ) -> Union[Tuple[torch.Tensor, torch.Tensor], Tuple[torch.Tensor]]:
213
+ mixed_query_layer = self.query(hidden_states)
214
+
215
+ key_layer = self.transpose_for_scores(self.key(hidden_states))
216
+ value_layer = self.transpose_for_scores(self.value(hidden_states))
217
+ query_layer = self.transpose_for_scores(mixed_query_layer)
218
+
219
+ # Take the dot product between "query" and "key" to get the raw attention scores.
220
+ attention_scores = torch.matmul(query_layer, key_layer.transpose(-1, -2))
221
+
222
+ attention_scores = attention_scores / math.sqrt(self.attention_head_size)
223
+
224
+ # Normalize the attention scores to probabilities.
225
+ attention_probs = nn.functional.softmax(attention_scores, dim=-1)
226
+
227
+ # This is actually dropping out entire tokens to attend to, which might
228
+ # seem a bit unusual, but is taken from the original Transformer paper.
229
+ attention_probs = self.dropout(attention_probs)
230
+
231
+ # Mask heads if we want to
232
+ if head_mask is not None:
233
+ attention_probs = attention_probs * head_mask
234
+
235
+ context_layer = torch.matmul(attention_probs, value_layer)
236
+
237
+ context_layer = context_layer.permute(0, 2, 1, 3).contiguous()
238
+ new_context_layer_shape = context_layer.size()[:-2] + (self.all_head_size,)
239
+ context_layer = context_layer.view(new_context_layer_shape)
240
+
241
+ outputs = (context_layer, attention_probs) if output_attentions else (context_layer,)
242
+
243
+ return outputs
244
+
245
+
246
+ class ViTSelfOutput(nn.Module):
247
+ """
248
+ The residual connection is defined in ViTLayer instead of here (as is the case with other models), due to the
249
+ layernorm applied before each block.
250
+ """
251
+
252
+ def __init__(self, config: ViTConfig) -> None:
253
+ super().__init__()
254
+ self.dense = nn.Linear(config.hidden_size, config.hidden_size)
255
+ self.dropout = nn.Dropout(config.hidden_dropout_prob)
256
+
257
+ def forward(self, hidden_states: torch.Tensor, input_tensor: torch.Tensor) -> torch.Tensor:
258
+ hidden_states = self.dense(hidden_states)
259
+ hidden_states = self.dropout(hidden_states)
260
+
261
+ return hidden_states
262
+
263
+
264
+ class ViTAttention(nn.Module):
265
+ def __init__(self, config: ViTConfig) -> None:
266
+ super().__init__()
267
+ self.attention = ViTSelfAttention(config)
268
+ self.output = ViTSelfOutput(config)
269
+ self.pruned_heads = set()
270
+
271
+ def prune_heads(self, heads: Set[int]) -> None:
272
+ if len(heads) == 0:
273
+ return
274
+ heads, index = find_pruneable_heads_and_indices(
275
+ heads, self.attention.num_attention_heads, self.attention.attention_head_size, self.pruned_heads
276
+ )
277
+
278
+ # Prune linear layers
279
+ self.attention.query = prune_linear_layer(self.attention.query, index)
280
+ self.attention.key = prune_linear_layer(self.attention.key, index)
281
+ self.attention.value = prune_linear_layer(self.attention.value, index)
282
+ self.output.dense = prune_linear_layer(self.output.dense, index, dim=1)
283
+
284
+ # Update hyper params and store pruned heads
285
+ self.attention.num_attention_heads = self.attention.num_attention_heads - len(heads)
286
+ self.attention.all_head_size = self.attention.attention_head_size * self.attention.num_attention_heads
287
+ self.pruned_heads = self.pruned_heads.union(heads)
288
+
289
+ def forward(
290
+ self,
291
+ hidden_states: torch.Tensor,
292
+ head_mask: Optional[torch.Tensor] = None,
293
+ output_attentions: bool = False,
294
+ ) -> Union[Tuple[torch.Tensor, torch.Tensor], Tuple[torch.Tensor]]:
295
+ self_outputs = self.attention(hidden_states, head_mask, output_attentions)
296
+
297
+ attention_output = self.output(self_outputs[0], hidden_states)
298
+
299
+ outputs = (attention_output,) + self_outputs[1:] # add attentions if we output them
300
+ return outputs
301
+
302
+
303
+ class ViTIntermediate(nn.Module):
304
+ def __init__(self, config: ViTConfig) -> None:
305
+ super().__init__()
306
+ self.dense = nn.Linear(config.hidden_size, config.intermediate_size)
307
+ if isinstance(config.hidden_act, str):
308
+ self.intermediate_act_fn = ACT2FN[config.hidden_act]
309
+ else:
310
+ self.intermediate_act_fn = config.hidden_act
311
+
312
+ def forward(self, hidden_states: torch.Tensor) -> torch.Tensor:
313
+ hidden_states = self.dense(hidden_states)
314
+ hidden_states = self.intermediate_act_fn(hidden_states)
315
+
316
+ return hidden_states
317
+
318
+
319
+ class ViTOutput(nn.Module):
320
+ def __init__(self, config: ViTConfig) -> None:
321
+ super().__init__()
322
+ self.dense = nn.Linear(config.intermediate_size, config.hidden_size)
323
+ self.dropout = nn.Dropout(config.hidden_dropout_prob)
324
+
325
+ def forward(self, hidden_states: torch.Tensor, input_tensor: torch.Tensor) -> torch.Tensor:
326
+ hidden_states = self.dense(hidden_states)
327
+ hidden_states = self.dropout(hidden_states)
328
+
329
+ hidden_states = hidden_states + input_tensor
330
+
331
+ return hidden_states
332
+
333
+
334
+ class ViTLayer(nn.Module):
335
+ """This corresponds to the Block class in the timm implementation."""
336
+
337
+ def __init__(self, config: ViTConfig) -> None:
338
+ super().__init__()
339
+ self.chunk_size_feed_forward = config.chunk_size_feed_forward
340
+ self.seq_len_dim = 1
341
+ self.attention = ViTAttention(config)
342
+ self.intermediate = ViTIntermediate(config)
343
+ self.output = ViTOutput(config)
344
+ self.layernorm_before = nn.LayerNorm(config.hidden_size, eps=config.layer_norm_eps)
345
+ self.layernorm_after = nn.LayerNorm(config.hidden_size, eps=config.layer_norm_eps)
346
+
347
+ def forward(
348
+ self,
349
+ hidden_states: torch.Tensor,
350
+ head_mask: Optional[torch.Tensor] = None,
351
+ output_attentions: bool = False,
352
+ ) -> Union[Tuple[torch.Tensor, torch.Tensor], Tuple[torch.Tensor]]:
353
+ self_attention_outputs = self.attention(
354
+ self.layernorm_before(hidden_states), # in ViT, layernorm is applied before self-attention
355
+ head_mask,
356
+ output_attentions=output_attentions,
357
+ )
358
+ attention_output = self_attention_outputs[0]
359
+ outputs = self_attention_outputs[1:] # add self attentions if we output attention weights
360
+
361
+ # first residual connection
362
+ hidden_states = attention_output + hidden_states
363
+
364
+ # in ViT, layernorm is also applied after self-attention
365
+ layer_output = self.layernorm_after(hidden_states)
366
+ layer_output = self.intermediate(layer_output)
367
+
368
+ # second residual connection is done here
369
+ layer_output = self.output(layer_output, hidden_states)
370
+
371
+ outputs = (layer_output,) + outputs
372
+
373
+ return outputs
374
+
375
+
376
+ class ViTEncoder(nn.Module):
377
+ def __init__(self, config: ViTConfig) -> None:
378
+ super().__init__()
379
+ self.config = config
380
+ self.layer = nn.ModuleList([ViTLayer(config) for _ in range(config.num_hidden_layers)])
381
+ self.gradient_checkpointing = False
382
+
383
+ def forward(
384
+ self,
385
+ hidden_states: torch.Tensor,
386
+ head_mask: Optional[torch.Tensor] = None,
387
+ output_attentions: bool = False,
388
+ output_hidden_states: bool = False,
389
+ return_dict: bool = True,
390
+ ) -> Union[tuple, BaseModelOutput]:
391
+ all_hidden_states = () if output_hidden_states else None
392
+ all_self_attentions = () if output_attentions else None
393
+
394
+ for i, layer_module in enumerate(self.layer):
395
+ if output_hidden_states:
396
+ all_hidden_states = all_hidden_states + (hidden_states,)
397
+
398
+ layer_head_mask = head_mask[i] if head_mask is not None else None
399
+
400
+ if self.gradient_checkpointing and self.training:
401
+
402
+ def create_custom_forward(module):
403
+ def custom_forward(*inputs):
404
+ return module(*inputs, output_attentions)
405
+
406
+ return custom_forward
407
+
408
+ layer_outputs = torch.utils.checkpoint.checkpoint(
409
+ create_custom_forward(layer_module),
410
+ hidden_states,
411
+ layer_head_mask,
412
+ )
413
+ else:
414
+ layer_outputs = layer_module(hidden_states, layer_head_mask, output_attentions)
415
+
416
+ hidden_states = layer_outputs[0]
417
+
418
+ if output_attentions:
419
+ all_self_attentions = all_self_attentions + (layer_outputs[1],)
420
+
421
+ if output_hidden_states:
422
+ all_hidden_states = all_hidden_states + (hidden_states,)
423
+
424
+ if not return_dict:
425
+ return tuple(v for v in [hidden_states, all_hidden_states, all_self_attentions] if v is not None)
426
+ return BaseModelOutput(
427
+ last_hidden_state=hidden_states,
428
+ hidden_states=all_hidden_states,
429
+ attentions=all_self_attentions,
430
+ )
431
+
432
+
433
+ class ViTPreTrainedModel(PreTrainedModel):
434
+ """
435
+ An abstract class to handle weights initialization and a simple interface for downloading and loading pretrained
436
+ models.
437
+ """
438
+
439
+ config_class = ViTConfig
440
+ base_model_prefix = "vit"
441
+ main_input_name = "pixel_values"
442
+ supports_gradient_checkpointing = True
443
+ _no_split_modules = []
444
+
445
+ def _init_weights(self, module: Union[nn.Linear, nn.Conv2d, nn.LayerNorm]) -> None:
446
+ """Initialize the weights"""
447
+ if isinstance(module, (nn.Linear, nn.Conv2d)):
448
+ # Upcast the input in `fp32` and cast it back to desired `dtype` to avoid
449
+ # `trunc_normal_cpu` not implemented in `half` issues
450
+ module.weight.data = nn.init.trunc_normal_(
451
+ module.weight.data.to(torch.float32), mean=0.0, std=self.config.initializer_range
452
+ ).to(module.weight.dtype)
453
+ if module.bias is not None:
454
+ module.bias.data.zero_()
455
+ elif isinstance(module, nn.LayerNorm):
456
+ module.bias.data.zero_()
457
+ module.weight.data.fill_(1.0)
458
+ elif isinstance(module, ViTEmbeddings):
459
+ module.position_embeddings.data = nn.init.trunc_normal_(
460
+ module.position_embeddings.data.to(torch.float32),
461
+ mean=0.0,
462
+ std=self.config.initializer_range,
463
+ ).to(module.position_embeddings.dtype)
464
+
465
+ module.cls_token.data = nn.init.trunc_normal_(
466
+ module.cls_token.data.to(torch.float32),
467
+ mean=0.0,
468
+ std=self.config.initializer_range,
469
+ ).to(module.cls_token.dtype)
470
+
471
+ def _set_gradient_checkpointing(self, module: ViTEncoder, value: bool = False) -> None:
472
+ if isinstance(module, ViTEncoder):
473
+ module.gradient_checkpointing = value
474
+
475
+
476
+ VIT_START_DOCSTRING = r"""
477
+ This model is a PyTorch [torch.nn.Module](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) subclass. Use it
478
+ as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage and
479
+ behavior.
480
+
481
+ Parameters:
482
+ config ([`ViTConfig`]): Model configuration class with all the parameters of the model.
483
+ Initializing with a config file does not load the weights associated with the model, only the
484
+ configuration. Check out the [`~PreTrainedModel.from_pretrained`] method to load the model weights.
485
+ """
486
+
487
+ VIT_INPUTS_DOCSTRING = r"""
488
+ Args:
489
+ pixel_values (`torch.FloatTensor` of shape `(batch_size, num_channels, height, width)`):
490
+ Pixel values. Pixel values can be obtained using [`AutoImageProcessor`]. See [`ViTImageProcessor.__call__`]
491
+ for details.
492
+
493
+ head_mask (`torch.FloatTensor` of shape `(num_heads,)` or `(num_layers, num_heads)`, *optional*):
494
+ Mask to nullify selected heads of the self-attention modules. Mask values selected in `[0, 1]`:
495
+
496
+ - 1 indicates the head is **not masked**,
497
+ - 0 indicates the head is **masked**.
498
+
499
+ output_attentions (`bool`, *optional*):
500
+ Whether or not to return the attentions tensors of all attention layers. See `attentions` under returned
501
+ tensors for more detail.
502
+ output_hidden_states (`bool`, *optional*):
503
+ Whether or not to return the hidden states of all layers. See `hidden_states` under returned tensors for
504
+ more detail.
505
+ interpolate_pos_encoding (`bool`, *optional*):
506
+ Whether to interpolate the pre-trained position encodings.
507
+ return_dict (`bool`, *optional*):
508
+ Whether or not to return a [`~utils.ModelOutput`] instead of a plain tuple.
509
+ """
510
+
511
+
512
+ @add_start_docstrings(
513
+ "The bare ViT Model transformer outputting raw hidden-states without any specific head on top.",
514
+ VIT_START_DOCSTRING,
515
+ )
516
+ class ViTModel(ViTPreTrainedModel):
517
+ def __init__(self, config: ViTConfig, add_pooling_layer: bool = True, use_mask_token: bool = False):
518
+ super().__init__(config)
519
+ self.config = config
520
+
521
+ self.embeddings = ViTEmbeddings(config, use_mask_token=use_mask_token)
522
+ self.encoder = ViTEncoder(config)
523
+
524
+ self.layernorm = nn.LayerNorm(config.hidden_size, eps=config.layer_norm_eps)
525
+ self.pooler = ViTPooler(config) if add_pooling_layer else None
526
+
527
+ # Initialize weights and apply final processing
528
+ self.post_init()
529
+
530
+ def get_input_embeddings(self) -> ViTPatchEmbeddings:
531
+ return self.embeddings.patch_embeddings
532
+
533
+ def _prune_heads(self, heads_to_prune: Dict[int, List[int]]) -> None:
534
+ """
535
+ Prunes heads of the model. heads_to_prune: dict of {layer_num: list of heads to prune in this layer} See base
536
+ class PreTrainedModel
537
+ """
538
+ for layer, heads in heads_to_prune.items():
539
+ self.encoder.layer[layer].attention.prune_heads(heads)
540
+
541
+ @add_start_docstrings_to_model_forward(VIT_INPUTS_DOCSTRING)
542
+ @add_code_sample_docstrings(
543
+ checkpoint=_CHECKPOINT_FOR_DOC,
544
+ output_type=BaseModelOutputWithPooling,
545
+ config_class=_CONFIG_FOR_DOC,
546
+ modality="vision",
547
+ expected_output=_EXPECTED_OUTPUT_SHAPE,
548
+ )
549
+ def forward(
550
+ self,
551
+ pixel_values: Optional[torch.Tensor] = None,
552
+ bool_masked_pos: Optional[torch.BoolTensor] = None,
553
+ head_mask: Optional[torch.Tensor] = None,
554
+ output_attentions: Optional[bool] = None,
555
+ output_hidden_states: Optional[bool] = None,
556
+ interpolate_pos_encoding: Optional[bool] = None,
557
+ return_dict: Optional[bool] = None,
558
+ ) -> Union[Tuple, BaseModelOutputWithPooling]:
559
+ r"""
560
+ bool_masked_pos (`torch.BoolTensor` of shape `(batch_size, num_patches)`, *optional*):
561
+ Boolean masked positions. Indicates which patches are masked (1) and which aren't (0).
562
+ """
563
+ output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
564
+ output_hidden_states = (
565
+ output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
566
+ )
567
+ return_dict = return_dict if return_dict is not None else self.config.use_return_dict
568
+
569
+ if pixel_values is None:
570
+ raise ValueError("You have to specify pixel_values")
571
+
572
+ # Prepare head mask if needed
573
+ # 1.0 in head_mask indicate we keep the head
574
+ # attention_probs has shape bsz x n_heads x N x N
575
+ # input head_mask has shape [num_heads] or [num_hidden_layers x num_heads]
576
+ # and head_mask is converted to shape [num_hidden_layers x batch x num_heads x seq_length x seq_length]
577
+ head_mask = self.get_head_mask(head_mask, self.config.num_hidden_layers)
578
+
579
+ # TODO: maybe have a cleaner way to cast the input (from `ImageProcessor` side?)
580
+ expected_dtype = self.embeddings.patch_embeddings.projection.weight.dtype
581
+ if pixel_values.dtype != expected_dtype:
582
+ pixel_values = pixel_values.to(expected_dtype)
583
+
584
+ embedding_output = self.embeddings(
585
+ pixel_values, bool_masked_pos=bool_masked_pos, interpolate_pos_encoding=interpolate_pos_encoding
586
+ )
587
+
588
+ encoder_outputs = self.encoder(
589
+ embedding_output,
590
+ head_mask=head_mask,
591
+ output_attentions=output_attentions,
592
+ output_hidden_states=output_hidden_states,
593
+ return_dict=return_dict,
594
+ )
595
+ sequence_output = encoder_outputs[0]
596
+ sequence_output = self.layernorm(sequence_output)
597
+ pooled_output = self.pooler(sequence_output) if self.pooler is not None else None
598
+
599
+ if not return_dict:
600
+ head_outputs = (sequence_output, pooled_output) if pooled_output is not None else (sequence_output,)
601
+ return head_outputs + encoder_outputs[1:]
602
+
603
+ return BaseModelOutputWithPooling(
604
+ last_hidden_state=sequence_output,
605
+ pooler_output=pooled_output,
606
+ hidden_states=encoder_outputs.hidden_states,
607
+ attentions=encoder_outputs.attentions,
608
+ )
609
+
610
+
611
+ class ViTPooler(nn.Module):
612
+ def __init__(self, config: ViTConfig):
613
+ super().__init__()
614
+ self.dense = nn.Linear(config.hidden_size, config.hidden_size)
615
+ self.activation = nn.Tanh()
616
+
617
+ def forward(self, hidden_states):
618
+ # We "pool" the model by simply taking the hidden state corresponding
619
+ # to the first token.
620
+ first_token_tensor = hidden_states[:, 0]
621
+ pooled_output = self.dense(first_token_tensor)
622
+ pooled_output = self.activation(pooled_output)
623
+ return pooled_output
624
+
625
+
626
+ @add_start_docstrings(
627
+ """ViT Model with a decoder on top for masked image modeling, as proposed in [SimMIM](https://arxiv.org/abs/2111.09886).
628
+
629
+ <Tip>
630
+
631
+ Note that we provide a script to pre-train this model on custom data in our [examples
632
+ directory](https://github.com/huggingface/transformers/tree/main/examples/pytorch/image-pretraining).
633
+
634
+ </Tip>
635
+ """,
636
+ VIT_START_DOCSTRING,
637
+ )
638
+ class ViTForMaskedImageModeling(ViTPreTrainedModel):
639
+ def __init__(self, config: ViTConfig) -> None:
640
+ super().__init__(config)
641
+
642
+ self.vit = ViTModel(config, add_pooling_layer=False, use_mask_token=True)
643
+
644
+ self.decoder = nn.Sequential(
645
+ nn.Conv2d(
646
+ in_channels=config.hidden_size,
647
+ out_channels=config.encoder_stride**2 * config.num_channels,
648
+ kernel_size=1,
649
+ ),
650
+ nn.PixelShuffle(config.encoder_stride),
651
+ )
652
+
653
+ # Initialize weights and apply final processing
654
+ self.post_init()
655
+
656
+ @add_start_docstrings_to_model_forward(VIT_INPUTS_DOCSTRING)
657
+ @replace_return_docstrings(output_type=MaskedImageModelingOutput, config_class=_CONFIG_FOR_DOC)
658
+ def forward(
659
+ self,
660
+ pixel_values: Optional[torch.Tensor] = None,
661
+ bool_masked_pos: Optional[torch.BoolTensor] = None,
662
+ head_mask: Optional[torch.Tensor] = None,
663
+ output_attentions: Optional[bool] = None,
664
+ output_hidden_states: Optional[bool] = None,
665
+ interpolate_pos_encoding: Optional[bool] = None,
666
+ return_dict: Optional[bool] = None,
667
+ ) -> Union[tuple, MaskedImageModelingOutput]:
668
+ r"""
669
+ bool_masked_pos (`torch.BoolTensor` of shape `(batch_size, num_patches)`):
670
+ Boolean masked positions. Indicates which patches are masked (1) and which aren't (0).
671
+
672
+ Returns:
673
+
674
+ Examples:
675
+ ```python
676
+ >>> from transformers import AutoImageProcessor, ViTForMaskedImageModeling
677
+ >>> import torch
678
+ >>> from PIL import Image
679
+ >>> import requests
680
+
681
+ >>> url = "http://images.cocodataset.org/val2017/000000039769.jpg"
682
+ >>> image = Image.open(requests.get(url, stream=True).raw)
683
+
684
+ >>> image_processor = AutoImageProcessor.from_pretrained("google/vit-base-patch16-224-in21k")
685
+ >>> model = ViTForMaskedImageModeling.from_pretrained("google/vit-base-patch16-224-in21k")
686
+
687
+ >>> num_patches = (model.config.image_size // model.config.patch_size) ** 2
688
+ >>> pixel_values = image_processor(images=image, return_tensors="pt").pixel_values
689
+ >>> # create random boolean mask of shape (batch_size, num_patches)
690
+ >>> bool_masked_pos = torch.randint(low=0, high=2, size=(1, num_patches)).bool()
691
+
692
+ >>> outputs = model(pixel_values, bool_masked_pos=bool_masked_pos)
693
+ >>> loss, reconstructed_pixel_values = outputs.loss, outputs.reconstruction
694
+ >>> list(reconstructed_pixel_values.shape)
695
+ [1, 3, 224, 224]
696
+ ```"""
697
+ return_dict = return_dict if return_dict is not None else self.config.use_return_dict
698
+
699
+ if bool_masked_pos is not None and (self.config.patch_size != self.config.encoder_stride):
700
+ raise ValueError(
701
+ "When `bool_masked_pos` is provided, `patch_size` must be equal to `encoder_stride` to ensure that "
702
+ "the reconstructed image has the same dimensions as the input."
703
+ f"Got `patch_size` = {self.config.patch_size} and `encoder_stride` = {self.config.encoder_stride}."
704
+ )
705
+
706
+ outputs = self.vit(
707
+ pixel_values,
708
+ bool_masked_pos=bool_masked_pos,
709
+ head_mask=head_mask,
710
+ output_attentions=output_attentions,
711
+ output_hidden_states=output_hidden_states,
712
+ interpolate_pos_encoding=interpolate_pos_encoding,
713
+ return_dict=return_dict,
714
+ )
715
+
716
+ sequence_output = outputs[0]
717
+
718
+ # Reshape to (batch_size, num_channels, height, width)
719
+ sequence_output = sequence_output[:, 1:]
720
+ batch_size, sequence_length, num_channels = sequence_output.shape
721
+ height = width = math.floor(sequence_length**0.5)
722
+ sequence_output = sequence_output.permute(0, 2, 1).reshape(batch_size, num_channels, height, width)
723
+
724
+ # Reconstruct pixel values
725
+ reconstructed_pixel_values = self.decoder(sequence_output)
726
+
727
+ masked_im_loss = None
728
+ if bool_masked_pos is not None:
729
+ size = self.config.image_size // self.config.patch_size
730
+ bool_masked_pos = bool_masked_pos.reshape(-1, size, size)
731
+ mask = (
732
+ bool_masked_pos.repeat_interleave(self.config.patch_size, 1)
733
+ .repeat_interleave(self.config.patch_size, 2)
734
+ .unsqueeze(1)
735
+ .contiguous()
736
+ )
737
+ reconstruction_loss = nn.functional.l1_loss(pixel_values, reconstructed_pixel_values, reduction="none")
738
+ masked_im_loss = (reconstruction_loss * mask).sum() / (mask.sum() + 1e-5) / self.config.num_channels
739
+
740
+ if not return_dict:
741
+ output = (reconstructed_pixel_values,) + outputs[1:]
742
+ return ((masked_im_loss,) + output) if masked_im_loss is not None else output
743
+
744
+ return MaskedImageModelingOutput(
745
+ loss=masked_im_loss,
746
+ reconstruction=reconstructed_pixel_values,
747
+ hidden_states=outputs.hidden_states,
748
+ attentions=outputs.attentions,
749
+ )
750
+
751
+
752
+ @add_start_docstrings(
753
+ """
754
+ ViT Model transformer with an image classification head on top (a linear layer on top of the final hidden state of
755
+ the [CLS] token) e.g. for ImageNet.
756
+
757
+ <Tip>
758
+
759
+ Note that it's possible to fine-tune ViT on higher resolution images than the ones it has been trained on, by
760
+ setting `interpolate_pos_encoding` to `True` in the forward of the model. This will interpolate the pre-trained
761
+ position embeddings to the higher resolution.
762
+
763
+ </Tip>
764
+ """,
765
+ VIT_START_DOCSTRING,
766
+ )
767
+ class ViTForImageClassification(ViTPreTrainedModel):
768
+ def __init__(self, config: ViTConfig) -> None:
769
+ super().__init__(config)
770
+
771
+ self.num_labels = config.num_labels
772
+ self.vit = ViTModel(config, add_pooling_layer=False)
773
+
774
+ # Classifier head
775
+ self.classifier = nn.Linear(config.hidden_size, config.num_labels) if config.num_labels > 0 else nn.Identity()
776
+
777
+ # Initialize weights and apply final processing
778
+ self.post_init()
779
+
780
+ @add_start_docstrings_to_model_forward(VIT_INPUTS_DOCSTRING)
781
+ @add_code_sample_docstrings(
782
+ checkpoint=_IMAGE_CLASS_CHECKPOINT,
783
+ output_type=ImageClassifierOutput,
784
+ config_class=_CONFIG_FOR_DOC,
785
+ expected_output=_IMAGE_CLASS_EXPECTED_OUTPUT,
786
+ )
787
+ def forward(
788
+ self,
789
+ pixel_values: Optional[torch.Tensor] = None,
790
+ head_mask: Optional[torch.Tensor] = None,
791
+ labels: Optional[torch.Tensor] = None,
792
+ output_attentions: Optional[bool] = None,
793
+ output_hidden_states: Optional[bool] = None,
794
+ interpolate_pos_encoding: Optional[bool] = None,
795
+ return_dict: Optional[bool] = None,
796
+ ) -> Union[tuple, ImageClassifierOutput]:
797
+ r"""
798
+ labels (`torch.LongTensor` of shape `(batch_size,)`, *optional*):
799
+ Labels for computing the image classification/regression loss. Indices should be in `[0, ...,
800
+ config.num_labels - 1]`. If `config.num_labels == 1` a regression loss is computed (Mean-Square loss), If
801
+ `config.num_labels > 1` a classification loss is computed (Cross-Entropy).
802
+ """
803
+ return_dict = return_dict if return_dict is not None else self.config.use_return_dict
804
+
805
+ outputs = self.vit(
806
+ pixel_values,
807
+ head_mask=head_mask,
808
+ output_attentions=output_attentions,
809
+ output_hidden_states=output_hidden_states,
810
+ interpolate_pos_encoding=interpolate_pos_encoding,
811
+ return_dict=return_dict,
812
+ )
813
+
814
+ sequence_output = outputs[0]
815
+
816
+ logits = self.classifier(sequence_output[:, 0, :])
817
+
818
+ loss = None
819
+ if labels is not None:
820
+ # move labels to correct device to enable model parallelism
821
+ labels = labels.to(logits.device)
822
+ if self.config.problem_type is None:
823
+ if self.num_labels == 1:
824
+ self.config.problem_type = "regression"
825
+ elif self.num_labels > 1 and (labels.dtype == torch.long or labels.dtype == torch.int):
826
+ self.config.problem_type = "single_label_classification"
827
+ else:
828
+ self.config.problem_type = "multi_label_classification"
829
+
830
+ if self.config.problem_type == "regression":
831
+ loss_fct = MSELoss()
832
+ if self.num_labels == 1:
833
+ loss = loss_fct(logits.squeeze(), labels.squeeze())
834
+ else:
835
+ loss = loss_fct(logits, labels)
836
+ elif self.config.problem_type == "single_label_classification":
837
+ loss_fct = CrossEntropyLoss()
838
+ loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))
839
+ elif self.config.problem_type == "multi_label_classification":
840
+ loss_fct = BCEWithLogitsLoss()
841
+ loss = loss_fct(logits, labels)
842
+
843
+ if not return_dict:
844
+ output = (logits,) + outputs[1:]
845
+ return ((loss,) + output) if loss is not None else output
846
+
847
+ return ImageClassifierOutput(
848
+ loss=loss,
849
+ logits=logits,
850
+ hidden_states=outputs.hidden_states,
851
+ attentions=outputs.attentions,
852
+ )
853
+
854
+
855
+ @add_start_docstrings(
856
+ """
857
+ ViT Model transformer with an semantic segmentation head on top (a linear layer on top of the final hidden state of
858
+ the [CLS] token) e.g. for ImageNet.
859
+
860
+ <Tip>
861
+
862
+ Note that it's possible to fine-tune ViT on higher resolution images than the ones it has been trained on, by
863
+ setting `interpolate_pos_encoding` to `True` in the forward of the model. This will interpolate the pre-trained
864
+ position embeddings to the higher resolution.
865
+
866
+ </Tip>
867
+ """,
868
+ VIT_START_DOCSTRING,
869
+ )
870
+ class ViTForSemanticSegmentation(ViTPreTrainedModel):
871
+ def __init__(self, config: ViTConfig) -> None:
872
+ super().__init__(config)
873
+
874
+ self.num_labels = config.num_labels
875
+ self.vit = ViTModel(config, add_pooling_layer=False)
876
+
877
+ self.hw_shape = round(config.image_size / config.patch_size)
878
+ # Segmentation decoder
879
+ self.decoder_norm = nn.LayerNorm(config.hidden_size, eps=config.layer_norm_eps)
880
+ self.decoder_mlp = nn.Linear(config.hidden_size, 256)
881
+ # upsample
882
+ self.decoder_classifier = nn.Linear(256, config.num_labels)
883
+
884
+ # Initialize weights and apply final processing
885
+ self.post_init()
886
+
887
+ @add_start_docstrings_to_model_forward(VIT_INPUTS_DOCSTRING)
888
+ # @add_code_sample_docstrings(
889
+ # checkpoint=_IMAGE_CLASS_CHECKPOINT,
890
+ # output_type=SemanticSegmenterOutput,
891
+ # config_class=_CONFIG_FOR_DOC,
892
+ # expected_output=_IMAGE_CLASS_EXPECTED_OUTPUT,
893
+ # )
894
+ def forward(
895
+ self,
896
+ pixel_values: Optional[torch.Tensor] = None,
897
+ head_mask: Optional[torch.Tensor] = None,
898
+ labels: Optional[torch.Tensor] = None,
899
+ output_attentions: Optional[bool] = None,
900
+ output_hidden_states: Optional[bool] = None,
901
+ interpolate_pos_encoding: Optional[bool] = None,
902
+ return_dict: Optional[bool] = None,
903
+ ) -> Union[tuple, SemanticSegmenterOutput]:
904
+ r"""
905
+ labels (`torch.LongTensor` of shape `(batch_size,)`, *optional*):
906
+ Labels for computing the image classification/regression loss. Indices should be in `[0, ...,
907
+ config.num_labels - 1]`. If `config.num_labels == 1` a regression loss is computed (Mean-Square loss), If
908
+ `config.num_labels > 1` a classification loss is computed (Cross-Entropy).
909
+ """
910
+ return_dict = return_dict if return_dict is not None else self.config.use_return_dict
911
+
912
+ outputs = self.vit(
913
+ pixel_values,
914
+ head_mask=head_mask,
915
+ output_attentions=output_attentions,
916
+ output_hidden_states=output_hidden_states,
917
+ interpolate_pos_encoding=interpolate_pos_encoding,
918
+ return_dict=return_dict,
919
+ )
920
+
921
+ sequence_output = outputs[0]
922
+
923
+ out = sequence_output[:, 1:]
924
+ out = self.decoder_norm(out)
925
+ B, _, C = out.shape
926
+ # out = out.reshape(B, self.hw_shape, self.hw_shape, C).permute(0, 3, 1, 2).contiguous()
927
+ out = out.reshape([B, self.hw_shape, self.hw_shape, C])
928
+ out = self.decoder_mlp(out)
929
+ out = nn.functional.interpolate(out, scale_factor=4, mode='bilinear', align_corners=False)
930
+
931
+ logits = self.decoder_classifier(out)
932
+ logits = logits.permute(0, 3, 1, 2).contiguous()
933
+
934
+ loss = None
935
+ if labels is not None:
936
+ # upsample logits to the images' original size
937
+ upsampled_logits = nn.functional.interpolate(
938
+ logits, size=labels.shape[-2:], mode="bilinear", align_corners=False
939
+ )
940
+ if self.config.num_labels > 1:
941
+ loss_fct = CrossEntropyLoss(ignore_index=self.config.semantic_loss_ignore_index)
942
+ loss = loss_fct(upsampled_logits, labels)
943
+ elif self.config.num_labels == 1:
944
+ valid_mask = ((labels >= 0) & (labels != self.config.semantic_loss_ignore_index)).float()
945
+ loss_fct = BCEWithLogitsLoss(reduction="none")
946
+ loss = loss_fct(upsampled_logits.squeeze(1), labels.float())
947
+ loss = (loss * valid_mask).mean()
948
+ else:
949
+ raise ValueError(f"Number of labels should be >=0: {self.config.num_labels}")
950
+
951
+ # if not return_dict:
952
+ # output = (logits,) + outputs[1:]
953
+ # return ((loss,) + output) if loss is not None else output
954
+
955
+ return SemanticSegmenterOutput(
956
+ loss=loss,
957
+ logits=logits,
958
+ hidden_states=outputs.hidden_states if output_hidden_states else None,
959
+ attentions=outputs.attentions,
960
+ )
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:172662e48b46ddb302f136e458c4915d38c781e37becbc7b4842a8bdeceba33f
3
+ size 345082557