electro-sb commited on
Commit
4ac4222
·
0 Parent(s):

The image captioning project has been committed.

Browse files
.gitattributes ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ data/*.jpg filter=lfs diff=lfs merge=lfs -text
2
+ data/*.jpeg filter=lfs diff=lfs merge=lfs -text
3
+ data/*.png filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ # Image Captioning with BLIP
3
+
4
+ This project uses the Salesforce BLIP model to generate captions for images. It provides a simple web interface built with Gradio to upload an image and view the generated caption.
5
+
6
+ ## Setup
7
+
8
+ 1. **Clone the repository:**
9
+ ```bash
10
+ git clone https://huggingface.co/spaces/electro-sb/image_captioning
11
+ cd image_captioning
12
+ ```
13
+
14
+ 2. **Install dependencies:**
15
+ ```bash
16
+ pip install -r requirements.txt
17
+ ```
18
+
19
+ 3. **Set up your Hugging Face token:**
20
+ Create a `.env` file in the root of the project and add your Hugging Face API key:
21
+ ```
22
+ HF_API_KEY=<your-hugging-face-api-key>
23
+ ```
24
+
25
+ 4. **Run the application:**
26
+ ```bash
27
+ python app.py
28
+ ```
29
+
30
+ The application will be available at `http://localhost:7860`.
31
+
32
+ ## Usage
33
+
34
+ 1. Open your web browser and navigate to `http://localhost:7860`.
35
+ 2. Upload an image using the provided interface.
36
+ 3. Click the "Caption" button to generate a caption for the image.
37
+ 4. The generated caption will be displayed in the "Caption" textbox.
app.py ADDED
@@ -0,0 +1,75 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from transformers import pipeline, AutoTokenizer
2
+ import io
3
+ import base64
4
+ from PIL import Image
5
+ import gradio as gr
6
+
7
+ model = "Salesforce/blip-image-captioning-large"
8
+ tokenizer = AutoTokenizer.from_pretrained(model, use_fast=True)
9
+
10
+ pipe = pipeline(task="image-to-text",
11
+ model=model,
12
+ tokenizer=tokenizer)
13
+
14
+ def image_to_base64(image: Image) -> str:
15
+ """
16
+ Convert an image to a base64 string.
17
+ """
18
+ bytearray= io.BytesIO()
19
+ image.save(bytearray, format="PNG")
20
+ return str(base64.b64encode(bytearray.getvalue()).decode('utf-8'))
21
+
22
+
23
+ def caption_image(image):
24
+ result = pipe(
25
+ image_to_base64(image),
26
+ #Temperature=0.7,
27
+ # max_length=130,
28
+ # min_length=30,
29
+ #do_sample=True
30
+ )
31
+ return result[0]['generated_text'].upper()
32
+
33
+ if __name__ == "__main__":
34
+ gr.close_all()
35
+
36
+ with gr.Blocks() as interface:
37
+ gr.Markdown("### Image Captioning using BLIP Large")
38
+ with gr.Row():
39
+ image_input = gr.Image(type="pil", label="Image")
40
+ with gr.Row():
41
+ caption_output = gr.Textbox(lines=2, label="Caption")
42
+ with gr.Row():
43
+ clear_button = gr.ClearButton()
44
+ caption_button = gr.Button("Caption", variant="primary")
45
+
46
+ with gr.Row():
47
+ example_images = gr.Examples(
48
+ examples=[
49
+ "data/image1.jpg",
50
+ "data/image2.png",
51
+ "data/image3.jpg",
52
+ "data/image4.jpg",
53
+ "data/image5.jpg",
54
+ "data/image6.png",
55
+ "data/image7.png",
56
+ "data/image8.jpeg",
57
+ "data/image9.jpeg",
58
+ "data/image10.jpg",
59
+ ],
60
+ inputs=[image_input],
61
+ label="Example Images"
62
+ )
63
+
64
+
65
+ caption_button.click(fn=caption_image,
66
+ inputs=[image_input],
67
+ outputs=[caption_output]
68
+ )
69
+
70
+ clear_button.click(fn=lambda: [None,""],
71
+ inputs=[],
72
+ outputs=[image_input, caption_output])
73
+
74
+ interface.launch(share=True, server_port=7860)
75
+
data/image1.jpg ADDED

Git LFS Details

  • SHA256: afddd13df8628271d0639f79cd051cbc78dee191c3d51aebba64b5e40d504e7c
  • Pointer size: 131 Bytes
  • Size of remote file: 126 kB
data/image10.jpg ADDED

Git LFS Details

  • SHA256: 351e533b52e199388c6080d59ec5cf68385ee0bcc290c9e7cefcdcb22cbe905a
  • Pointer size: 130 Bytes
  • Size of remote file: 66.6 kB
data/image2.png ADDED

Git LFS Details

  • SHA256: bd6687193878f1073398b6ebbadbe90f3a662128f61bfe3f4e16eee0dc767ac6
  • Pointer size: 132 Bytes
  • Size of remote file: 3.18 MB
data/image3.jpg ADDED

Git LFS Details

  • SHA256: 7bc09bf1bf72768fbbad7c953bd18da7514f7fc7a7f7050e8a55a4c123e1b21c
  • Pointer size: 131 Bytes
  • Size of remote file: 154 kB
data/image4.jpg ADDED

Git LFS Details

  • SHA256: 0e2f2bed2c4e0f453a56372de0bfb3a940424520efcd1dd9a4c452beb4dfa32d
  • Pointer size: 130 Bytes
  • Size of remote file: 57.6 kB
data/image5.jpg ADDED

Git LFS Details

  • SHA256: 20eb9e64f705dcae37dd4939eb9f5b508f4ac745fd9207b42a76fc921c01657d
  • Pointer size: 130 Bytes
  • Size of remote file: 76.2 kB
data/image6.png ADDED

Git LFS Details

  • SHA256: 5110e2680b48add2ed24c138d378a7167d31ea0b2494df98ab115270aef20976
  • Pointer size: 132 Bytes
  • Size of remote file: 2.55 MB
data/image7.png ADDED

Git LFS Details

  • SHA256: 5bef6a1e28beab196b8795c496af099ff32bc176bdf3faf610e06669f05cee7d
  • Pointer size: 132 Bytes
  • Size of remote file: 2.68 MB
data/image8.jpeg ADDED

Git LFS Details

  • SHA256: a5d0cb6da3084065fe7d666b4e7fbdd1207cd6e111446e0320f5c8dba70a25ca
  • Pointer size: 130 Bytes
  • Size of remote file: 21.4 kB
data/image9.jpeg ADDED

Git LFS Details

  • SHA256: 36924c77048fc6ca2afe5f4272d2c9fe470670241560c9fc3e2339311bfff81f
  • Pointer size: 130 Bytes
  • Size of remote file: 15.5 kB
requirements.txt ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ transformers
2
+ gradio
3
+ python-dotenv
4
+ pillow
5
+ torch
6
+ sentencepiece
7
+ huggingface_hub