Spaces:
Sleeping
Sleeping
| license: apache-2.0 | |
| title: RAG Image | |
| sdk: gradio | |
| emoji: 📚 | |
| colorFrom: green | |
| colorTo: blue | |
| title: RAG Image Captioningemoji: 📸colorFrom: bluecolorTo: greensdk: gradiosdk_version: 3.35.2app_file: app.pypinned: false | |
| RAG Image Captioning Space | |
| This Space hosts a RAG-based image captioning model that generates captions for images using CLIP (openai/clip-vit-base-patch32), T5 (t5-small), and SentenceTransformer (all-MiniLM-L6-v2). It retrieves similar captions from a FAISS index and generates a final caption using T5. | |
| Usage | |
| Upload an image via the Gradio interface to generate a caption. | |
| Use the API (/api/predict) to integrate with web or mobile apps. | |
| Files | |
| app.py: Gradio interface for the Space. | |
| inference.py: Custom inference script with generate_rag_caption. | |
| requirements.txt: Dependencies. | |
| faiss_index.idx: FAISS index for retrieval. | |
| captions.json: Caption corpus. | |
| Setup | |
| Dependencies are installed from requirements.txt. The en_core_web_sm spaCy model is downloaded automatically. | |
| pip install -r requirements.txt | |
| python -m spacy download en_core_web_sm | |
| API Integration | |
| Send a POST request to /api/predict with a base64-encoded image: | |
| import requests | |
| import base64 | |
| api_url = "https://your-username-rag-image-captioning.hf.space/api/predict" | |
| with open("test_image.jpg", "rb") as f: | |
| image_bytes = f.read() | |
| base64_image = f"data:image/jpeg;base64,{base64.b64encode(image_bytes).decode()}" | |
| response = requests.post(api_url, json={"data": [base64_image]}) | |
| print(response.json()["data"][0]) |