Spaces:

Prince9191
/

Object-Detection-fb_basic

Sleeping

App Files Files Community

Prince9191 commited on Apr 17, 2025

Commit

740d62f

verified ·

1 Parent(s): 34fcec8

Upload 9 files

Browse files

Files changed (10) hide show

.gitattributes +5 -0
OD_test.png +3 -0
Object_detection.ipynb +0 -0
Object_detection.png +3 -0
README.md +104 -14
class_object.PNG +3 -0
helper.py +103 -0
kid_bike.jpeg +3 -0
object_detection.py +68 -0
pipeline.PNG +3 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,8 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+class_object.PNG filter=lfs diff=lfs merge=lfs -text
+kid_bike.jpeg filter=lfs diff=lfs merge=lfs -text
+Object_detection.png filter=lfs diff=lfs merge=lfs -text
+OD_test.png filter=lfs diff=lfs merge=lfs -text
+pipeline.PNG filter=lfs diff=lfs merge=lfs -text

OD_test.png ADDED Viewed

Git LFS Details

SHA256: 5dd41badb3f8c639d3aae132743102e5c8f4f635f3d9fd759be866195c410cb0
Pointer size: 132 Bytes
Size of remote file: 1.19 MB

Object_detection.ipynb ADDED Viewed

The diff for this file is too large to render. See raw diff

Object_detection.png ADDED Viewed

Git LFS Details

SHA256: 87ac554e34e7d15ec67f3bd57da64a1df9a9c222fadb781756a4b73313c72da1
Pointer size: 131 Bytes
Size of remote file: 534 kB

README.md CHANGED Viewed

@@ -1,14 +1,104 @@
----
-title: Object-Detection-fb Basic
-emoji: 🦀
-colorFrom: gray
-colorTo: yellow
-sdk: gradio
-sdk_version: 5.25.2
-app_file: app.py
-pinned: false
-license: mit
-short_description: A Object Identification and text to speech model
----
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+# A Object Identification and text to speech model using HuggingFace Transformers
+Learn how to build the below pipeline using Gradio (for user interface and deployment), Facebook' detr-resnet-50 model for Object Identification and kakao-enterprise' vits-ljs model for text to speech.
+<img src="pipeline.PNG" alt="project pipeline" width="400">
+## Description
+The Image Description and Audio Transcript App is a web-based application that leverages artificial intelligence to generate descriptions for uploaded images. Additionally, the tool provides an audio transcript of the description for users with visual impairments, making it more accessible.
+This app uses BLIP (Bootstrapping Language-Image Pre-training) for image captioning and gTTS (Google Text-to-Speech) for converting the description into an audio file.
+## Features
+* Upload an image and receive an AI-generated description.
+* Convert the description into an audio file for accessibility.
+* Responsive web interface built using Gradio.
+* Simple, user-friendly design for a seamless experience.
+## Technologies Used
+* Programming Language: Python 3.7+
+* AI Model: BLIP for image captioning
+* Text-to-Speech: gTTS (Google Text-to-Speech)
+* Web Interface: Gradio
+* Libraries: PyTorch, Transformers, Gradio, gTTS
+## Libraries and Dependencies
+* torch: Deep learning framework for the BLIP model
+* transformers: Hugging Face library for pre-trained models like BLIP
+* gtts: Library for text-to-speech conversion
+* gradio: For building the web interface
+## To install the necessary packages, run:
+```bash
+pip install torch transformers gtts gradio
+```
+## Installation and Setup
+* Clone the repository:
+```bash
+git clone https://github.com/your-username/image-description-audio-transcript.git
+cd image-description-audio-transcript
+```
+* Create a virtual environment (optional but recommended):
+```bash
+python -m venv venv
+source venv/bin/activate  # On Windows, use venv\Scripts\activate
+```
+* Install the required packages:
+```bash
+pip install torch transformers gtts gradio
+```
+* Ensure that the necessary models are downloaded: The BLIP model will automatically be downloaded when the script is run, and gTTS will use an online service to convert text to speech.
+## Usage
+1. Run the application:
+```bash
+python object_detection.py
+```
+2. Open a web browser and navigate to http://127.0.0.1:7860 to access the app.
+3. Upload an image through the provided input.
+4. Click the `Generate Description` button to get a text description of the image.
+5. Click the `Click here for an audio transcript` button to hear the description.
+## Configuration
+You can modify the following parameters in the app.py file:
+* host: The IP address on which the server runs (default: '127.0.0.1')
+* port: The port number (default: 7860)
+* debug: Debug mode for development (default: True)
+## Contributing
+Contributions to improve the Image Description and Audio Transcript App are welcome. Please follow these steps:
+* Fork the repository.
+* Create a new branch (git checkout -b feature/AmazingFeature).
+* Commit your changes (git commit -m 'Add some AmazingFeature').
+* Push to the branch (git push origin feature/AmazingFeature).
+* Open a Pull Request.
+## License
+This project is licensed under the MIT License - see the LICENSE.md file for details.
+## Acknowledgments
+* Salesforce for the BLIP image captioning model.
+* Google for the gTTS service.
+* Gradio for the easy-to-use interface framework.
+## Disclaimer
+This tool is designed to assist with generating descriptions and audio transcripts from images, but always review the generated content for accuracy and appropriateness before use.

class_object.PNG ADDED Viewed

Git LFS Details

SHA256: db9cba36b187cc3fb7990d0a787c376116661a711c4caefdca66d8406fb2310d
Pointer size: 131 Bytes
Size of remote file: 509 kB

helper.py ADDED Viewed

	@@ -0,0 +1,103 @@

+# -*- coding: utf-8 -*-
+"""helper.ipynb
+Automatically generated by Colaboratory.
+Original file is located at
+    https://colab.research.google.com/drive/1IDhEhDLbnCTaBfIbuMtlNFW3ntQiZBwA
+"""
+import io
+import matplotlib.pyplot as plt
+import requests
+import inflect
+from PIL import Image
+def load_image_from_url(url):
+    return Image.open(requests.get(url, stream=True).raw)
+def render_results_in_image(in_pil_img, in_results):
+    plt.figure(figsize=(16, 10))
+    plt.imshow(in_pil_img)
+    ax = plt.gca()
+    for prediction in in_results:
+        x, y = prediction['box']['xmin'], prediction['box']['ymin']
+        w = prediction['box']['xmax'] - prediction['box']['xmin']
+        h = prediction['box']['ymax'] - prediction['box']['ymin']
+        ax.add_patch(plt.Rectangle((x, y),
+                                   w,
+                                   h,
+                                   fill=False,
+                                   color="green",
+                                   linewidth=2))
+        ax.text(
+           x,
+           y,
+           f"{prediction['label']}: {round(prediction['score']*100, 1)}%",
+           color='red'
+        )
+    plt.axis("off")
+    # Save the modified image to a BytesIO object
+    img_buf = io.BytesIO()
+    plt.savefig(img_buf, format='png',
+                bbox_inches='tight',
+                pad_inches=0)
+    img_buf.seek(0)
+    modified_image = Image.open(img_buf)
+    # Close the plot to prevent it from being displayed
+    plt.close()
+    return modified_image
+def summarize_predictions_natural_language(predictions):
+    summary = {}
+    p = inflect.engine()
+    for prediction in predictions:
+        label = prediction['label']
+        if label in summary:
+            summary[label] += 1
+        else:
+            summary[label] = 1
+    result_string = "In this image, there are "
+    for i, (label, count) in enumerate(summary.items()):
+        count_string = p.number_to_words(count)
+        result_string += f"{count_string} {label}"
+        if count > 1:
+          result_string += "s"
+        result_string += " "
+        if i == len(summary) - 2:
+          result_string += "and "
+    # Remove the trailing comma and space
+    result_string = result_string.rstrip(', ') + "."
+    return result_string
+##### To ignore warnings #####
+import warnings
+import logging
+from transformers import logging as hf_logging
+def ignore_warnings():
+    # Ignore specific Python warnings
+    warnings.filterwarnings("ignore", message="Some weights of the model checkpoint")
+    warnings.filterwarnings("ignore", message="Could not find image processor class")
+    warnings.filterwarnings("ignore", message="The `max_size` parameter is deprecated")
+    # Adjust logging for libraries using the logging module
+    logging.basicConfig(level=logging.ERROR)
+    hf_logging.set_verbosity_error()
+########

kid_bike.jpeg ADDED Viewed

Git LFS Details

SHA256: 72afae9ce5cafa9045a5549fbb7a27b356e0725e48072cee2b17044ba14f68e8
Pointer size: 131 Bytes
Size of remote file: 284 kB

object_detection.py ADDED Viewed

	@@ -0,0 +1,68 @@

+import torch
+from transformers import BlipProcessor, BlipForConditionalGeneration
+from gtts import gTTS
+import tempfile
+import subprocess
+import sys
+import gradio
+def ensure_package_installed(package_name):
+    try:
+        __import__(package_name)
+    except ImportError:
+        print(f"{package_name} package not found. Installing...")
+        subprocess.check_call([sys.executable, "-m", "pip", "install", package_name])
+        __import__(package_name)
+# Check and install openai
+ensure_package_installed("gradio")
+ensure_package_installed("transformers")
+ensure_package_installed("gtts")
+# Load the image captioning model
+processor = BlipProcessor.from_pretrained("Salesforce/blip-image-captioning-base")
+model = BlipForConditionalGeneration.from_pretrained("Salesforce/blip-image-captioning-base")
+def generate_description(image):
+    """Generates a textual description of the given image using a pre-trained BLIP model."""
+    inputs = processor(image, return_tensors="pt").to(model.device)
+    output = model.generate(**inputs)
+    description = processor.decode(output[0], skip_special_tokens=True)
+    return description
+def text_to_speech(text):
+    """Converts text to speech using gTTS and returns the audio file path."""
+    tts = gTTS(text=text, lang='en')
+    temp_audio = tempfile.NamedTemporaryFile(delete=False, suffix=".mp3")
+    tts.save(temp_audio.name)
+    return temp_audio.name
+def process_image(image):
+    """Processes the uploaded image to generate description and return audio file."""
+    description = generate_description(image)
+    return description
+def get_audio(description):
+    """Generates the audio file for the given description."""
+    return text_to_speech(description)
+# Build Gradio Interface
+with gradio.Blocks() as demo:
+    gradio.Markdown("# Image Description and Audio Transcript App")
+    gradio.Markdown("Upload an image to get an AI-generated description. Click the button to hear the description.")
+    with gradio.Row():
+        image_input = gradio.Image(type="pil")
+        text_output = gradio.Textbox(label="Generated Description")
+    generate_btn = gradio.Button("Generate Description")
+    audio_btn = gradio.Button("Click here for an audio transcript")
+    audio_output = gradio.Audio()
+    generate_btn.click(process_image, inputs=[image_input], outputs=[text_output])
+    audio_btn.click(get_audio, inputs=[text_output], outputs=[audio_output])
+# Launch the Gradio app
+demo.launch()

pipeline.PNG ADDED Viewed

Git LFS Details

SHA256: 51294dfc4e36ff8148c0781d3e3b8912369e806c035ef50757357ae028d5f900
Pointer size: 131 Bytes
Size of remote file: 752 kB