Prince9191 commited on
Commit
740d62f
·
verified ·
1 Parent(s): 34fcec8

Upload 9 files

Browse files
.gitattributes CHANGED
@@ -33,3 +33,8 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ class_object.PNG filter=lfs diff=lfs merge=lfs -text
37
+ kid_bike.jpeg filter=lfs diff=lfs merge=lfs -text
38
+ Object_detection.png filter=lfs diff=lfs merge=lfs -text
39
+ OD_test.png filter=lfs diff=lfs merge=lfs -text
40
+ pipeline.PNG filter=lfs diff=lfs merge=lfs -text
OD_test.png ADDED

Git LFS Details

  • SHA256: 5dd41badb3f8c639d3aae132743102e5c8f4f635f3d9fd759be866195c410cb0
  • Pointer size: 132 Bytes
  • Size of remote file: 1.19 MB
Object_detection.ipynb ADDED
The diff for this file is too large to render. See raw diff
 
Object_detection.png ADDED

Git LFS Details

  • SHA256: 87ac554e34e7d15ec67f3bd57da64a1df9a9c222fadb781756a4b73313c72da1
  • Pointer size: 131 Bytes
  • Size of remote file: 534 kB
README.md CHANGED
@@ -1,14 +1,104 @@
1
- ---
2
- title: Object-Detection-fb Basic
3
- emoji: 🦀
4
- colorFrom: gray
5
- colorTo: yellow
6
- sdk: gradio
7
- sdk_version: 5.25.2
8
- app_file: app.py
9
- pinned: false
10
- license: mit
11
- short_description: A Object Identification and text to speech model
12
- ---
13
-
14
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # A Object Identification and text to speech model using HuggingFace Transformers
2
+
3
+ Learn how to build the below pipeline using Gradio (for user interface and deployment), Facebook' detr-resnet-50 model for Object Identification and kakao-enterprise' vits-ljs model for text to speech.
4
+
5
+ <img src="pipeline.PNG" alt="project pipeline" width="400">
6
+
7
+
8
+ ## Description
9
+
10
+ The Image Description and Audio Transcript App is a web-based application that leverages artificial intelligence to generate descriptions for uploaded images. Additionally, the tool provides an audio transcript of the description for users with visual impairments, making it more accessible.
11
+
12
+ This app uses BLIP (Bootstrapping Language-Image Pre-training) for image captioning and gTTS (Google Text-to-Speech) for converting the description into an audio file.
13
+
14
+ ## Features
15
+
16
+ * Upload an image and receive an AI-generated description.
17
+ * Convert the description into an audio file for accessibility.
18
+ * Responsive web interface built using Gradio.
19
+ * Simple, user-friendly design for a seamless experience.
20
+
21
+ ## Technologies Used
22
+
23
+ * Programming Language: Python 3.7+
24
+ * AI Model: BLIP for image captioning
25
+ * Text-to-Speech: gTTS (Google Text-to-Speech)
26
+ * Web Interface: Gradio
27
+ * Libraries: PyTorch, Transformers, Gradio, gTTS
28
+
29
+ ## Libraries and Dependencies
30
+
31
+ * torch: Deep learning framework for the BLIP model
32
+ * transformers: Hugging Face library for pre-trained models like BLIP
33
+ * gtts: Library for text-to-speech conversion
34
+ * gradio: For building the web interface
35
+
36
+ ## To install the necessary packages, run:
37
+
38
+ ```bash
39
+ pip install torch transformers gtts gradio
40
+ ```
41
+
42
+ ## Installation and Setup
43
+
44
+ * Clone the repository:
45
+ ```bash
46
+ git clone https://github.com/your-username/image-description-audio-transcript.git
47
+ cd image-description-audio-transcript
48
+ ```
49
+
50
+ * Create a virtual environment (optional but recommended):
51
+ ```bash
52
+ python -m venv venv
53
+ source venv/bin/activate # On Windows, use venv\Scripts\activate
54
+ ```
55
+
56
+ * Install the required packages:
57
+ ```bash
58
+ pip install torch transformers gtts gradio
59
+ ```
60
+
61
+ * Ensure that the necessary models are downloaded: The BLIP model will automatically be downloaded when the script is run, and gTTS will use an online service to convert text to speech.
62
+
63
+ ## Usage
64
+
65
+ 1. Run the application:
66
+ ```bash
67
+ python object_detection.py
68
+ ```
69
+ 2. Open a web browser and navigate to http://127.0.0.1:7860 to access the app.
70
+ 3. Upload an image through the provided input.
71
+ 4. Click the `Generate Description` button to get a text description of the image.
72
+ 5. Click the `Click here for an audio transcript` button to hear the description.
73
+
74
+ ## Configuration
75
+
76
+ You can modify the following parameters in the app.py file:
77
+
78
+ * host: The IP address on which the server runs (default: '127.0.0.1')
79
+ * port: The port number (default: 7860)
80
+ * debug: Debug mode for development (default: True)
81
+
82
+ ## Contributing
83
+
84
+ Contributions to improve the Image Description and Audio Transcript App are welcome. Please follow these steps:
85
+
86
+ * Fork the repository.
87
+ * Create a new branch (git checkout -b feature/AmazingFeature).
88
+ * Commit your changes (git commit -m 'Add some AmazingFeature').
89
+ * Push to the branch (git push origin feature/AmazingFeature).
90
+ * Open a Pull Request.
91
+
92
+ ## License
93
+
94
+ This project is licensed under the MIT License - see the LICENSE.md file for details.
95
+
96
+ ## Acknowledgments
97
+
98
+ * Salesforce for the BLIP image captioning model.
99
+ * Google for the gTTS service.
100
+ * Gradio for the easy-to-use interface framework.
101
+
102
+ ## Disclaimer
103
+
104
+ This tool is designed to assist with generating descriptions and audio transcripts from images, but always review the generated content for accuracy and appropriateness before use.
class_object.PNG ADDED

Git LFS Details

  • SHA256: db9cba36b187cc3fb7990d0a787c376116661a711c4caefdca66d8406fb2310d
  • Pointer size: 131 Bytes
  • Size of remote file: 509 kB
helper.py ADDED
@@ -0,0 +1,103 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # -*- coding: utf-8 -*-
2
+ """helper.ipynb
3
+
4
+ Automatically generated by Colaboratory.
5
+
6
+ Original file is located at
7
+ https://colab.research.google.com/drive/1IDhEhDLbnCTaBfIbuMtlNFW3ntQiZBwA
8
+ """
9
+
10
+ import io
11
+ import matplotlib.pyplot as plt
12
+ import requests
13
+ import inflect
14
+ from PIL import Image
15
+
16
+ def load_image_from_url(url):
17
+ return Image.open(requests.get(url, stream=True).raw)
18
+
19
+ def render_results_in_image(in_pil_img, in_results):
20
+ plt.figure(figsize=(16, 10))
21
+ plt.imshow(in_pil_img)
22
+
23
+ ax = plt.gca()
24
+
25
+ for prediction in in_results:
26
+
27
+ x, y = prediction['box']['xmin'], prediction['box']['ymin']
28
+ w = prediction['box']['xmax'] - prediction['box']['xmin']
29
+ h = prediction['box']['ymax'] - prediction['box']['ymin']
30
+
31
+ ax.add_patch(plt.Rectangle((x, y),
32
+ w,
33
+ h,
34
+ fill=False,
35
+ color="green",
36
+ linewidth=2))
37
+ ax.text(
38
+ x,
39
+ y,
40
+ f"{prediction['label']}: {round(prediction['score']*100, 1)}%",
41
+ color='red'
42
+ )
43
+
44
+ plt.axis("off")
45
+
46
+ # Save the modified image to a BytesIO object
47
+ img_buf = io.BytesIO()
48
+ plt.savefig(img_buf, format='png',
49
+ bbox_inches='tight',
50
+ pad_inches=0)
51
+ img_buf.seek(0)
52
+ modified_image = Image.open(img_buf)
53
+
54
+ # Close the plot to prevent it from being displayed
55
+ plt.close()
56
+
57
+ return modified_image
58
+
59
+ def summarize_predictions_natural_language(predictions):
60
+ summary = {}
61
+ p = inflect.engine()
62
+
63
+ for prediction in predictions:
64
+ label = prediction['label']
65
+ if label in summary:
66
+ summary[label] += 1
67
+ else:
68
+ summary[label] = 1
69
+
70
+ result_string = "In this image, there are "
71
+ for i, (label, count) in enumerate(summary.items()):
72
+ count_string = p.number_to_words(count)
73
+ result_string += f"{count_string} {label}"
74
+ if count > 1:
75
+ result_string += "s"
76
+
77
+ result_string += " "
78
+
79
+ if i == len(summary) - 2:
80
+ result_string += "and "
81
+
82
+ # Remove the trailing comma and space
83
+ result_string = result_string.rstrip(', ') + "."
84
+
85
+ return result_string
86
+
87
+
88
+ ##### To ignore warnings #####
89
+ import warnings
90
+ import logging
91
+ from transformers import logging as hf_logging
92
+
93
+ def ignore_warnings():
94
+ # Ignore specific Python warnings
95
+ warnings.filterwarnings("ignore", message="Some weights of the model checkpoint")
96
+ warnings.filterwarnings("ignore", message="Could not find image processor class")
97
+ warnings.filterwarnings("ignore", message="The `max_size` parameter is deprecated")
98
+
99
+ # Adjust logging for libraries using the logging module
100
+ logging.basicConfig(level=logging.ERROR)
101
+ hf_logging.set_verbosity_error()
102
+
103
+ ########
kid_bike.jpeg ADDED

Git LFS Details

  • SHA256: 72afae9ce5cafa9045a5549fbb7a27b356e0725e48072cee2b17044ba14f68e8
  • Pointer size: 131 Bytes
  • Size of remote file: 284 kB
object_detection.py ADDED
@@ -0,0 +1,68 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import torch
2
+ from transformers import BlipProcessor, BlipForConditionalGeneration
3
+ from gtts import gTTS
4
+ import tempfile
5
+ import subprocess
6
+ import sys
7
+ import gradio
8
+
9
+
10
+ def ensure_package_installed(package_name):
11
+ try:
12
+ __import__(package_name)
13
+ except ImportError:
14
+ print(f"{package_name} package not found. Installing...")
15
+ subprocess.check_call([sys.executable, "-m", "pip", "install", package_name])
16
+ __import__(package_name)
17
+
18
+ # Check and install openai
19
+ ensure_package_installed("gradio")
20
+ ensure_package_installed("transformers")
21
+ ensure_package_installed("gtts")
22
+
23
+
24
+ # Load the image captioning model
25
+ processor = BlipProcessor.from_pretrained("Salesforce/blip-image-captioning-base")
26
+ model = BlipForConditionalGeneration.from_pretrained("Salesforce/blip-image-captioning-base")
27
+
28
+ def generate_description(image):
29
+ """Generates a textual description of the given image using a pre-trained BLIP model."""
30
+ inputs = processor(image, return_tensors="pt").to(model.device)
31
+ output = model.generate(**inputs)
32
+ description = processor.decode(output[0], skip_special_tokens=True)
33
+ return description
34
+
35
+ def text_to_speech(text):
36
+ """Converts text to speech using gTTS and returns the audio file path."""
37
+ tts = gTTS(text=text, lang='en')
38
+ temp_audio = tempfile.NamedTemporaryFile(delete=False, suffix=".mp3")
39
+ tts.save(temp_audio.name)
40
+ return temp_audio.name
41
+
42
+ def process_image(image):
43
+ """Processes the uploaded image to generate description and return audio file."""
44
+ description = generate_description(image)
45
+ return description
46
+
47
+ def get_audio(description):
48
+ """Generates the audio file for the given description."""
49
+ return text_to_speech(description)
50
+
51
+ # Build Gradio Interface
52
+ with gradio.Blocks() as demo:
53
+ gradio.Markdown("# Image Description and Audio Transcript App")
54
+ gradio.Markdown("Upload an image to get an AI-generated description. Click the button to hear the description.")
55
+
56
+ with gradio.Row():
57
+ image_input = gradio.Image(type="pil")
58
+ text_output = gradio.Textbox(label="Generated Description")
59
+
60
+ generate_btn = gradio.Button("Generate Description")
61
+ audio_btn = gradio.Button("Click here for an audio transcript")
62
+ audio_output = gradio.Audio()
63
+
64
+ generate_btn.click(process_image, inputs=[image_input], outputs=[text_output])
65
+ audio_btn.click(get_audio, inputs=[text_output], outputs=[audio_output])
66
+
67
+ # Launch the Gradio app
68
+ demo.launch()
pipeline.PNG ADDED

Git LFS Details

  • SHA256: 51294dfc4e36ff8148c0781d3e3b8912369e806c035ef50757357ae028d5f900
  • Pointer size: 131 Bytes
  • Size of remote file: 752 kB