Imagroune
/

LLMEyeCap

Model card Files Files and versions

Imagroune commited on Sep 15, 2023

Commit

37fe9ed

·

1 Parent(s): 5f6da0d

Create README.md

Files changed (1) hide show

README.md +81 -0

README.md ADDED Viewed

	@@ -0,0 +1,81 @@

+# LLMEyeCap: Giving Eyes to Large Language Models
+## Model Description
+LLMEyeCap is a Novel Object Captioning model designed to extend the capabilities of Large Language Models with vision. It uses a combination of state-of-the-art models and techniques to not only detect objects within images but also generate meaningful captions for them.
+### Features
+- **Novel Object Captioning + Bounding Boxes**
+- **ResNet50 as a backbone**
+- **Customized DETR model for bounding box detection**
+- **BERT Tokenizer and GPT-2 for text generation**
+- **Replacing classification layers with Transformer Decoder Object Captioning layers**
+## Training Data
+The model was trained on the following datasets:
+- VOC Dataset
+- COCO 80
+- COCO 91
+Training was carried out for 30 epochs.
+## Usage
+Here's how to use this model for object captioning:
+\`\`\`python
+  model = LLMEyeCapModel(num_queries=NUM_QUERIES,vocab_size=vocab_size,pad_token=PAD_TOKEN)
+  model = model.to(device)
+  state_dict = torch.load("LLMEyeCap_01.bin")
+  model.load_state_dict(state_dict)
+  def display_image_ds(image_path, bb, ll):
+    #print(len(boxes),len(boxes[0]),len(labels),len(labels[0]))
+    image = Image.open(image_path).convert('RGB')
+    fig, ax = plt.subplots(1, 1, figsize=(12, 20))  # Set the figure size
+    ax.imshow(image)
+    # Draw bounding boxes and labels
+    for box, label in zip(bb[0], cc[0]):
+        (x, y, w, h) = box
+        if (x==0 and y==0 and w==0 and h==0) or label=='na':
+            continue
+        x*=image.width
+        y*=image.height
+        w*=image.width
+        h*=image.height
+        rect = patches.Rectangle((x-w/2, y-h/2), w, h, linewidth=2, edgecolor='r', facecolor='none')
+        ax.add_patch(rect)
+        label_str = tokenizer.decode(label, skip_special_tokens=True)
+        #print("*",label_str,"*")
+        if label_str != 'na':
+            ax.text(x-w/2, y-h/2, label_str, color='r', bbox=dict(facecolor='white', edgecolor='r', pad=2),fontsize=18)
+  image_paths=["../data/coco91/train2017/000000291557.jpg", "../data/coco91/train2017/000000436027.jpg"]
+  for im in image_paths:
+    bb,cc= model.generate_caption( im, tokenizer, max_length=20,pad_sos=PAD_SOS)
+    display_image_ds(im, bb.to('cpu'), cc.to('cpu'))
+\`\`\`
+### Results
+. See tuto.ipynb file
+## Limitations and Future Work
+This 0.1 version is a stand alone model for captiong objects on images. It can be uses as it or trained on new objects without "catastrophic forgetting".
+Coming the 0.2 version with latent space to connect to hidden dims of LLMs.
+## Authors
+Imed MAGROUNE.