DemahAlmutairi's picture
Update README.md
6173fb2 verified

A newer version of the Gradio SDK is available: 6.5.0

Upgrade
metadata
title: Project2_Image_Captioning
emoji: 👀
colorFrom: purple
colorTo: pink
sdk: gradio
sdk_version: 4.44.1
app_file: app.py
pinned: false

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

README

Project Objectives

The primary objective of this project is to create an interactive application that allows users to input images or sketches to generate captions in English and translate those captions into Arabic.

Description of Implemented Pipelines

The application utilizes two key pipelines from the Hugging Face Transformers library:

  1. Image Captioning Pipeline:

    • Model: Salesforce/blip-image-captioning-base
    • This model takes an image as input and generates a descriptive caption in English. The captioning process involves understanding the content of the image and providing a coherent textual representation.
  2. Translation Pipeline:

    • Model: facebook/nllb-200-distilled-600M
    • This model translates the generated English caption into Arabic. It ensures that the translation retains the meaning and context of the original text, making it suitable for Arabic-speaking users.

Instructions for Using the Interface

  1. Input Options:

    • Upload Tab: Users can upload an image from their device.
    • Sketch Tab: Users can draw a sketch of an object using the sketchpad.
  2. Generating Captions:

    • After inputting an image or sketch, click the "Submit" button.
    • The application will process the input and display the generated English caption and its Arabic translation in the output fields.
  3. Examples:

    • The "Example Prompts" section provides sample images. Clicking on these examples will populate the upload interface with the selected image.
  4. Clearing Inputs:

    • Use the "Clear" button to reset the inputs in both tabs.

Justifications for Model and Pipeline Choices

  • Image Captioning: The chosen model (Salesforce/blip-image-captioning-base) is known for its efficiency and accuracy in generating descriptive captions for a wide variety of images. This makes it suitable for diverse user inputs.

  • Translation: The facebook/nllb-200-distilled-600M model is selected for its capability to handle multiple languages effectively, ensuring that the translation is contextually relevant and accurate for Arabic users.

Bilingual Implementation

The application addresses bilingual implementation by utilizing a dedicated translation pipeline that converts English captions into Arabic. This ensures accessibility for Arabic-speaking users, allowing them to understand the content generated by the image captioning model.