Sher1988 commited on
Commit
b66ff37
·
verified ·
1 Parent(s): c00940f

Delete README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -205
README.md DELETED
@@ -1,205 +0,0 @@
1
- ---
2
- title: Caption Gen
3
- emoji: 📸
4
- sdk: streamlit
5
- sdk_version: 1.43.0
6
- app_file: app.py
7
- ---
8
-
9
-
10
-
11
-
12
- \# AI Image Caption Generator
13
-
14
-
15
-
16
- A deep learning–based image captioning system built using a \*\*ResNet50 encoder\*\* and an \*\*LSTM decoder\*\*. The model generates natural language descriptions for uploaded images.
17
-
18
-
19
-
20
- \## Architecture
21
-
22
-
23
-
24
- \* \*\*Encoder:\*\* ResNet50 (frozen backbone)
25
-
26
- \* \*\*Decoder:\*\* LSTM-based sequence generator
27
-
28
- \* \*\*Training Dataset:\*\* Flickr8k
29
-
30
- \* \*\*Inference Framework:\*\* Streamlit
31
-
32
- \* \*\*Evaluation Metric:\*\* SacreBLEU
33
-
34
-
35
-
36
- The encoder extracts high-level visual features, which are then passed to the decoder to generate captions word by word.
37
-
38
-
39
-
40
- ---
41
-
42
-
43
-
44
- \## How It Works
45
-
46
-
47
-
48
- 1\. User uploads an image.
49
-
50
- 2\. Image is preprocessed and passed through the ResNet50 encoder.
51
-
52
- 3\. Extracted feature vector is fed into the LSTM decoder.
53
-
54
- 4\. Caption is generated using temperature-based sampling.
55
-
56
- 5\. If the image belongs to the Flickr8k dataset, BLEU metrics are displayed.
57
-
58
-
59
-
60
- ---
61
-
62
-
63
-
64
- \## Features
65
-
66
-
67
-
68
- \* Temperature-controlled caption generation
69
-
70
- \* SacreBLEU evaluation
71
-
72
- \* N-gram precision breakdown (1–4 gram)
73
-
74
- \* Clean Streamlit interface
75
-
76
- \* Fully CPU-compatible deployment
77
-
78
-
79
-
80
- ---
81
-
82
-
83
-
84
- \## Project Structure
85
-
86
-
87
-
88
- ```
89
-
90
- app.py
91
-
92
- models/
93
-
94
-   encoder.pth
95
-
96
-   decoder.pth
97
-
98
- models/
99
-
100
-   encoder.py
101
-
102
-   decoder.py
103
-
104
- utils/
105
-
106
-   transforms.py
107
-
108
-   vocab.py
109
-
110
-   helpers.py
111
-
112
- vocabulary.json
113
-
114
- requirements.txt
115
-
116
- ```
117
-
118
-
119
-
120
- ---
121
-
122
-
123
-
124
- \## Model Details
125
-
126
-
127
-
128
- \* Encoder weights size: ~92 MB
129
-
130
- \* Decoder weights size: ~32 MB
131
-
132
- \* Full encoder backbone included in state\_dict
133
-
134
- \* Inference runs on CPU
135
-
136
-
137
-
138
- ---
139
-
140
-
141
-
142
- \## Limitations
143
-
144
-
145
-
146
- \* Trained on Flickr8k (8,000 images)
147
-
148
- \* Performs best on outdoor scenes, people, and animals
149
-
150
- \* May generalize poorly to unseen domains
151
-
152
- \* CPU inference can be slow (2–5 seconds per image)
153
-
154
-
155
-
156
- ---
157
-
158
-
159
-
160
- \## Setup (Local)
161
-
162
-
163
-
164
- ```bash
165
-
166
- pip install -r requirements.txt
167
-
168
- streamlit run app.py
169
-
170
- ```
171
-
172
-
173
-
174
- ---
175
-
176
-
177
-
178
- \## Deployment
179
-
180
-
181
-
182
- This project is deployed on \*\*Hugging Face Spaces\*\* using Streamlit.
183
-
184
-
185
-
186
- ---
187
-
188
-
189
-
190
- \## License
191
-
192
-
193
-
194
- MIT License
195
-
196
-
197
-
198
- ---
199
-
200
-
201
-
202
- If you want, I can also write a \*\*short portfolio-style README\*\* optimized specifically for recruiters.
203
-
204
-
205
-