INVERTO commited on
Commit
10e4167
·
verified ·
1 Parent(s): ba41999

Upload trained bird captioning model, tokenizer, image processor, species mapping, and captions

Browse files
README.md ADDED
@@ -0,0 +1,94 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ # Bird Captioning and Classification Model (CUB-200-2011)
3
+
4
+ This is a fine-tuned VisionEncoderDecoderModel based on `nlpconnect/vit-gpt2-image-captioning`, trained on the CUB-200-2011 dataset for bird species classification and image captioning.
5
+
6
+ ## Model Description
7
+ - **Base Model**: ViT-GPT2 (`nlpconnect/vit-gpt2-image-captioning`)
8
+ - **Tasks**:
9
+ - Generates descriptive captions for bird images, including species and attributes.
10
+ - Classifies images into one of 200 bird species.
11
+ - **Dataset**: CUB-200-2011 (11,788 images, 200 bird species)
12
+ - **Training**: 10 epochs, batch size 16, mixed precision, AdamW optimizer (lr=3e-5), combined loss (caption + 0.5 * classification).
13
+ - **Best Validation Loss**: 0.0690 (Epoch 3)
14
+
15
+ ## Files
16
+ - `pytorch_model.bin`: Trained model weights
17
+ - `config.json`: Model configuration
18
+ - `preprocessor_config.json`: ViTImageProcessor settings
19
+ - `tokenizer_config.json`, `vocab.json`: GPT2 tokenizer files
20
+ - `species_mapping.txt`: Mapping of class indices to bird species names
21
+ - `cub200_captions.csv`: Generated captions for the dataset
22
+ - `model.py`: Custom `BirdCaptioningModel` class definition
23
+
24
+ ## Usage
25
+ ### Prerequisites
26
+ ```bash
27
+ pip install transformers torch huggingface_hub
28
+ ```
29
+
30
+ ### Load Model and Dependencies
31
+ ```python
32
+ from transformers import ViTImageProcessor, AutoTokenizer
33
+ from huggingface_hub import PyTorchModelHubMixin
34
+ import torch
35
+ from model import BirdCaptioningModel # Save model.py locally
36
+
37
+ # Load model
38
+ model = BirdCaptioningModel.from_pretrained("INVERTO/bird-captioning-cub200")
39
+ image_processor = ViTImageProcessor.from_pretrained("INVERTO/bird-captioning-cub200")
40
+ tokenizer = AutoTokenizer.from_pretrained("INVERTO/bird-captioning-cub200")
41
+ model.eval()
42
+
43
+ # Load species mapping
44
+ species_mapping = {}
45
+ with open("species_mapping.txt", "r") as f:
46
+ for line in f:
47
+ idx, name = line.strip().split(",", 1)
48
+ species_mapping[int(idx)] = name
49
+ ```
50
+
51
+ ### Inference
52
+ ```python
53
+ from PIL import Image
54
+
55
+ def predict_bird_image(image_path):
56
+ image = Image.open(image_path).convert("RGB")
57
+ pixel_values = image_processor(image, return_tensors="pt").pixel_values
58
+ with torch.no_grad():
59
+ output_ids = model.base_model.generate(pixel_values, max_length=75, num_beams=4)
60
+ _, class_logits = model(pixel_values)
61
+ predicted_class_idx = torch.argmax(class_logits, dim=1).item()
62
+ confidence = torch.nn.functional.softmax(class_logits, dim=1)[0, predicted_class_idx].item() * 100
63
+ caption = tokenizer.decode(output_ids[0], skip_special_tokens=True)
64
+ species = species_mapping.get(predicted_class_idx, "Unknown")
65
+ return caption, species, confidence
66
+
67
+ # Example
68
+ caption, species, confidence = predict_bird_image("/kaggle/input/cub2002011/CUB_200_2011/images/006.Least_Auklet/Least_Auklet_0007_795123.jpg")
69
+ print(f"Caption: {caption}")
70
+ print(f"Species: {species}")
71
+ print(f"Confidence: {confidence:.2f}%")
72
+ ```
73
+
74
+ ## Dataset
75
+ - **CUB-200-2011**: 11,788 images of 200 bird species with attribute annotations.
76
+ - Captions were generated based on species names and attributes (e.g., bill shape, wing color).
77
+
78
+ ## Training Details
79
+ - **Loss**: Combined captioning (CrossEntropy) and classification (CrossEntropy) loss.
80
+ - **Optimizer**: AdamW (lr=3e-5)
81
+ - **Scheduler**: CosineAnnealingLR
82
+ - **Hardware**: GPU (CUDA)
83
+ - **Training Time**: ~5 min/epoch
84
+
85
+ ## Limitations
86
+ - May overfit after Epoch 3 (validation loss increases).
87
+ - Captions are limited to species and up to 5 attributes.
88
+ - Classification accuracy not explicitly reported.
89
+
90
+ ## License
91
+ MIT License
92
+
93
+ ## Contact
94
+ For issues, contact INVERTO on Hugging Face.
config.json ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ {
2
+ "base_model": "nlpconnect/vit-gpt2-image-captioning",
3
+ "hidden_size": 768,
4
+ "num_classes": 200
5
+ }
cub200_captions.csv ADDED
The diff for this file is too large to render. See raw diff
 
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
model.py ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ from huggingface_hub import PyTorchModelHubMixin
3
+ import torch
4
+ import torch.nn as nn
5
+ from transformers import VisionEncoderDecoderModel
6
+
7
+ class BirdCaptioningModel(nn.Module, PyTorchModelHubMixin):
8
+ def __init__(self, num_classes=200):
9
+ super().__init__()
10
+ self.base_model = VisionEncoderDecoderModel.from_pretrained("nlpconnect/vit-gpt2-image-captioning")
11
+ self.hidden_size = self.base_model.decoder.config.hidden_size
12
+ self.classifier = nn.Linear(self.hidden_size, num_classes)
13
+
14
+ def forward(self, pixel_values, input_ids=None, attention_mask=None):
15
+ outputs = self.base_model(
16
+ pixel_values=pixel_values,
17
+ decoder_input_ids=input_ids,
18
+ decoder_attention_mask=attention_mask,
19
+ output_hidden_states=True,
20
+ return_dict=True
21
+ )
22
+ hidden_states = outputs.decoder_hidden_states[-1][:, 0, :]
23
+ class_logits = self.classifier(hidden_states)
24
+ return outputs.logits, class_logits
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fa8335ff5c550fcc24f4847f774e9e0b7e832880adaf246e6b0ec81f60b69b06
3
+ size 957455968
preprocessor_config.json ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "do_convert_rgb": null,
3
+ "do_normalize": true,
4
+ "do_rescale": true,
5
+ "do_resize": true,
6
+ "image_mean": [
7
+ 0.5,
8
+ 0.5,
9
+ 0.5
10
+ ],
11
+ "image_processor_type": "ViTImageProcessor",
12
+ "image_std": [
13
+ 0.5,
14
+ 0.5,
15
+ 0.5
16
+ ],
17
+ "resample": 2,
18
+ "rescale_factor": 0.00392156862745098,
19
+ "size": {
20
+ "height": 224,
21
+ "width": 224
22
+ }
23
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<|endoftext|>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "<|endoftext|>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": "<|endoftext|>",
17
+ "unk_token": {
18
+ "content": "<|endoftext|>",
19
+ "lstrip": false,
20
+ "normalized": false,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ }
24
+ }
species_mapping.txt ADDED
@@ -0,0 +1,200 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 0,Black footed Albatross
2
+ 1,Laysan Albatross
3
+ 2,Sooty Albatross
4
+ 3,Groove billed Ani
5
+ 4,Crested Auklet
6
+ 5,Least Auklet
7
+ 6,Parakeet Auklet
8
+ 7,Rhinoceros Auklet
9
+ 8,Brewer Blackbird
10
+ 9,Red winged Blackbird
11
+ 10,Rusty Blackbird
12
+ 11,Yellow headed Blackbird
13
+ 12,Bobolink
14
+ 13,Indigo Bunting
15
+ 14,Lazuli Bunting
16
+ 15,Painted Bunting
17
+ 16,Cardinal
18
+ 17,Spotted Catbird
19
+ 18,Gray Catbird
20
+ 19,Yellow breasted Chat
21
+ 20,Eastern Towhee
22
+ 21,Chuck will Widow
23
+ 22,Brandt Cormorant
24
+ 23,Red faced Cormorant
25
+ 24,Pelagic Cormorant
26
+ 25,Bronzed Cowbird
27
+ 26,Shiny Cowbird
28
+ 27,Brown Creeper
29
+ 28,American Crow
30
+ 29,Fish Crow
31
+ 30,Black billed Cuckoo
32
+ 31,Mangrove Cuckoo
33
+ 32,Yellow billed Cuckoo
34
+ 33,Gray crowned Rosy Finch
35
+ 34,Purple Finch
36
+ 35,Northern Flicker
37
+ 36,Acadian Flycatcher
38
+ 37,Great Crested Flycatcher
39
+ 38,Least Flycatcher
40
+ 39,Olive sided Flycatcher
41
+ 40,Scissor tailed Flycatcher
42
+ 41,Vermilion Flycatcher
43
+ 42,Yellow bellied Flycatcher
44
+ 43,Frigatebird
45
+ 44,Northern Fulmar
46
+ 45,Gadwall
47
+ 46,American Goldfinch
48
+ 47,European Goldfinch
49
+ 48,Boat tailed Grackle
50
+ 49,Eared Grebe
51
+ 50,Horned Grebe
52
+ 51,Pied billed Grebe
53
+ 52,Western Grebe
54
+ 53,Blue Grosbeak
55
+ 54,Evening Grosbeak
56
+ 55,Pine Grosbeak
57
+ 56,Rose breasted Grosbeak
58
+ 57,Pigeon Guillemot
59
+ 58,California Gull
60
+ 59,Glaucous winged Gull
61
+ 60,Heermann Gull
62
+ 61,Herring Gull
63
+ 62,Ivory Gull
64
+ 63,Ring billed Gull
65
+ 64,Slaty backed Gull
66
+ 65,Western Gull
67
+ 66,Anna Hummingbird
68
+ 67,Ruby throated Hummingbird
69
+ 68,Rufous Hummingbird
70
+ 69,Green Violetear
71
+ 70,Long tailed Jaeger
72
+ 71,Pomarine Jaeger
73
+ 72,Blue Jay
74
+ 73,Florida Jay
75
+ 74,Green Jay
76
+ 75,Dark eyed Junco
77
+ 76,Tropical Kingbird
78
+ 77,Gray Kingbird
79
+ 78,Belted Kingfisher
80
+ 79,Green Kingfisher
81
+ 80,Pied Kingfisher
82
+ 81,Ringed Kingfisher
83
+ 82,White breasted Kingfisher
84
+ 83,Red legged Kittiwake
85
+ 84,Horned Lark
86
+ 85,Pacific Loon
87
+ 86,Mallard
88
+ 87,Western Meadowlark
89
+ 88,Hooded Merganser
90
+ 89,Red breasted Merganser
91
+ 90,Mockingbird
92
+ 91,Nighthawk
93
+ 92,Clark Nutcracker
94
+ 93,White breasted Nuthatch
95
+ 94,Baltimore Oriole
96
+ 95,Hooded Oriole
97
+ 96,Orchard Oriole
98
+ 97,Scott Oriole
99
+ 98,Ovenbird
100
+ 99,Brown Pelican
101
+ 100,White Pelican
102
+ 101,Western Wood Pewee
103
+ 102,Sayornis
104
+ 103,American Pipit
105
+ 104,Whip poor Will
106
+ 105,Horned Puffin
107
+ 106,Common Raven
108
+ 107,White necked Raven
109
+ 108,American Redstart
110
+ 109,Geococcyx
111
+ 110,Loggerhead Shrike
112
+ 111,Great Grey Shrike
113
+ 112,Baird Sparrow
114
+ 113,Black throated Sparrow
115
+ 114,Brewer Sparrow
116
+ 115,Chipping Sparrow
117
+ 116,Clay colored Sparrow
118
+ 117,House Sparrow
119
+ 118,Field Sparrow
120
+ 119,Fox Sparrow
121
+ 120,Grasshopper Sparrow
122
+ 121,Harris Sparrow
123
+ 122,Henslow Sparrow
124
+ 123,Le Conte Sparrow
125
+ 124,Lincoln Sparrow
126
+ 125,Nelson Sharp tailed Sparrow
127
+ 126,Savannah Sparrow
128
+ 127,Seaside Sparrow
129
+ 128,Song Sparrow
130
+ 129,Tree Sparrow
131
+ 130,Vesper Sparrow
132
+ 131,White crowned Sparrow
133
+ 132,White throated Sparrow
134
+ 133,Cape Glossy Starling
135
+ 134,Bank Swallow
136
+ 135,Barn Swallow
137
+ 136,Cliff Swallow
138
+ 137,Tree Swallow
139
+ 138,Scarlet Tanager
140
+ 139,Summer Tanager
141
+ 140,Artic Tern
142
+ 141,Black Tern
143
+ 142,Caspian Tern
144
+ 143,Common Tern
145
+ 144,Elegant Tern
146
+ 145,Forsters Tern
147
+ 146,Least Tern
148
+ 147,Green tailed Towhee
149
+ 148,Brown Thrasher
150
+ 149,Sage Thrasher
151
+ 150,Black capped Vireo
152
+ 151,Blue headed Vireo
153
+ 152,Philadelphia Vireo
154
+ 153,Red eyed Vireo
155
+ 154,Warbling Vireo
156
+ 155,White eyed Vireo
157
+ 156,Yellow throated Vireo
158
+ 157,Bay breasted Warbler
159
+ 158,Black and white Warbler
160
+ 159,Black throated Blue Warbler
161
+ 160,Blue winged Warbler
162
+ 161,Canada Warbler
163
+ 162,Cape May Warbler
164
+ 163,Cerulean Warbler
165
+ 164,Chestnut sided Warbler
166
+ 165,Golden winged Warbler
167
+ 166,Hooded Warbler
168
+ 167,Kentucky Warbler
169
+ 168,Magnolia Warbler
170
+ 169,Mourning Warbler
171
+ 170,Myrtle Warbler
172
+ 171,Nashville Warbler
173
+ 172,Orange crowned Warbler
174
+ 173,Palm Warbler
175
+ 174,Pine Warbler
176
+ 175,Prairie Warbler
177
+ 176,Prothonotary Warbler
178
+ 177,Swainson Warbler
179
+ 178,Tennessee Warbler
180
+ 179,Wilson Warbler
181
+ 180,Worm eating Warbler
182
+ 181,Yellow Warbler
183
+ 182,Northern Waterthrush
184
+ 183,Louisiana Waterthrush
185
+ 184,Bohemian Waxwing
186
+ 185,Cedar Waxwing
187
+ 186,American Three toed Woodpecker
188
+ 187,Pileated Woodpecker
189
+ 188,Red bellied Woodpecker
190
+ 189,Red cockaded Woodpecker
191
+ 190,Red headed Woodpecker
192
+ 191,Downy Woodpecker
193
+ 192,Bewick Wren
194
+ 193,Cactus Wren
195
+ 194,Carolina Wren
196
+ 195,House Wren
197
+ 196,Marsh Wren
198
+ 197,Rock Wren
199
+ 198,Winter Wren
200
+ 199,Common Yellowthroat
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "added_tokens_decoder": {
4
+ "50256": {
5
+ "content": "<|endoftext|>",
6
+ "lstrip": false,
7
+ "normalized": false,
8
+ "rstrip": false,
9
+ "single_word": false,
10
+ "special": true
11
+ }
12
+ },
13
+ "bos_token": "<|endoftext|>",
14
+ "clean_up_tokenization_spaces": false,
15
+ "eos_token": "<|endoftext|>",
16
+ "extra_special_tokens": {},
17
+ "max_length": 32,
18
+ "model_max_length": 1024,
19
+ "pad_to_multiple_of": null,
20
+ "pad_token": "<|endoftext|>",
21
+ "pad_token_type_id": 0,
22
+ "padding_side": "right",
23
+ "stride": 0,
24
+ "tokenizer_class": "GPT2Tokenizer",
25
+ "truncation_side": "right",
26
+ "truncation_strategy": "longest_first",
27
+ "unk_token": "<|endoftext|>"
28
+ }
vocab.json ADDED
The diff for this file is too large to render. See raw diff