File size: 1,517 Bytes
c549745
 
 
 
 
 
 
 
2d10e05
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6c4bc71
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
---
license: apache-2.0
title: RAG Image
sdk: gradio
emoji: 📚
colorFrom: green
colorTo: blue
---
title: RAG Image Captioningemoji: 📸colorFrom: bluecolorTo: greensdk: gradiosdk_version: 3.35.2app_file: app.pypinned: false
RAG Image Captioning Space
This Space hosts a RAG-based image captioning model that generates captions for images using CLIP (openai/clip-vit-base-patch32), T5 (t5-small), and SentenceTransformer (all-MiniLM-L6-v2). It retrieves similar captions from a FAISS index and generates a final caption using T5.
Usage

Upload an image via the Gradio interface to generate a caption.
Use the API (/api/predict) to integrate with web or mobile apps.

Files

app.py: Gradio interface for the Space.
inference.py: Custom inference script with generate_rag_caption.
requirements.txt: Dependencies.
faiss_index.idx: FAISS index for retrieval.
captions.json: Caption corpus.

Setup
Dependencies are installed from requirements.txt. The en_core_web_sm spaCy model is downloaded automatically.
pip install -r requirements.txt
python -m spacy download en_core_web_sm

API Integration
Send a POST request to /api/predict with a base64-encoded image:
import requests
import base64

api_url = "https://your-username-rag-image-captioning.hf.space/api/predict"
with open("test_image.jpg", "rb") as f:
    image_bytes = f.read()
    base64_image = f"data:image/jpeg;base64,{base64.b64encode(image_bytes).decode()}"
response = requests.post(api_url, json={"data": [base64_image]})
print(response.json()["data"][0])