Commit ·
0e8c75e
1
Parent(s): 5fcaacd
Update README.md
Browse files
README.md
CHANGED
|
@@ -1,8 +1,8 @@
|
|
| 1 |
# Hindi Image Captioning Model
|
| 2 |
|
| 3 |
-
This is an encoder-decoder image captioning model made with VIT encoder and GPT2-Hindi as a decoder. This is a first attempt at using ViT + GPT2-Hindi for image captioning task. We used the Flickr8k Hindi Dataset
|
| 4 |
|
| 5 |
-
This model was trained using HuggingFace course community week, organized by Huggingface.
|
| 6 |
|
| 7 |
## How to use
|
| 8 |
|
|
@@ -21,8 +21,11 @@ else:
|
|
| 21 |
url = 'https://shorturl.at/fvxEQ'
|
| 22 |
image = Image.open(requests.get(url, stream=True).raw)
|
| 23 |
|
| 24 |
-
|
| 25 |
-
|
|
|
|
|
|
|
|
|
|
| 26 |
model = VisionEncoderDecoderModel.from_pretrained('team-indain-image-caption/hindi-image-captioning').to(device)
|
| 27 |
|
| 28 |
#Inference
|
|
@@ -32,4 +35,10 @@ clean_text = lambda x: x.replace('<|endoftext|>','').split('\n')[0]
|
|
| 32 |
caption_ids = model.generate(sample, max_length = 50)[0]
|
| 33 |
caption_text = clean_text(tokenizer.decode(caption_ids))
|
| 34 |
print(caption_text)
|
| 35 |
-
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
# Hindi Image Captioning Model
|
| 2 |
|
| 3 |
+
This is an encoder-decoder image captioning model made with VIT encoder and GPT2-Hindi as a decoder. This is a first attempt at using ViT + GPT2-Hindi for image captioning task. We used the Flickr8k Hindi Dataset available on kaggle to train the model.
|
| 4 |
|
| 5 |
+
This model was trained using HuggingFace course community week, organized by Huggingface.
|
| 6 |
|
| 7 |
## How to use
|
| 8 |
|
|
|
|
| 21 |
url = 'https://shorturl.at/fvxEQ'
|
| 22 |
image = Image.open(requests.get(url, stream=True).raw)
|
| 23 |
|
| 24 |
+
encoder_checkpoint = 'google/vit-base-patch16-224'
|
| 25 |
+
decoder_checkpoint = 'surajp/gpt2-hindi'
|
| 26 |
+
|
| 27 |
+
feature_extractor = ViTFeatureExtractor.from_pretrained(encoder_checkpoint)
|
| 28 |
+
tokenizer = AutoTokenizer.from_pretrained(decoder_checkpoint)
|
| 29 |
model = VisionEncoderDecoderModel.from_pretrained('team-indain-image-caption/hindi-image-captioning').to(device)
|
| 30 |
|
| 31 |
#Inference
|
|
|
|
| 35 |
caption_ids = model.generate(sample, max_length = 50)[0]
|
| 36 |
caption_text = clean_text(tokenizer.decode(caption_ids))
|
| 37 |
print(caption_text)
|
| 38 |
+
```
|
| 39 |
+
|
| 40 |
+
## Training data
|
| 41 |
+
We used the Flickr8k Hindi Dataset, which is the translated version of the original Flickr8k Dataset, available on Kaggle to train the model.
|
| 42 |
+
|
| 43 |
+
## Training procedure
|
| 44 |
+
This model was trained during HuggingFace course community week, organized by Huggingface. The training was done on Kaggle GPU.
|