Spaces:
Running
title: LoRA Caption Assistant
emoji: πΌοΈ
colorFrom: gray
colorTo: indigo
sdk: docker
app_port: 7860
LoRA Caption Assistant
An AI-powered web application designed to assist in generating high-quality, detailed captions for image and video datasets. This tool is specifically tailored for training LoRA (Low-Rank Adaptation) models, utilizing Google's Gemini API or a Local Qwen Model (via vLLM) to automate the captioning process.
Features
- Automated Captioning: Generates detailed, objective descriptions using Gemini 2.5 Pro or local Qwen-VL.
- LoRA Optimized: Automatic trigger word insertion and style-agnostic descriptions.
- Multi-Modal: Supports both image and video inputs.
- Character Tagging: Optional automatic identification and tagging of specific characters.
- Quality Assurance: AI-powered scoring system to evaluate caption quality (1-5 scale).
- Batch Processing: Robust queue system with rate limiting (RPM) and batch sizes.
- Export: Downloads the dataset (media + text files) as a ZIP file.
π Deployment on Hugging Face Spaces
This is the recommended way to run the application if you don't have a GPU.
Step 1: Create a Space
- Go to Hugging Face Spaces.
- Click Create new Space.
- Enter a name (e.g.,
lora-caption-assistant). - Select Docker as the SDK.
- Choose "Blank" or "Public" template.
- Click Create Space.
Step 2: Upload Files
Upload the contents of this repository to your Space. Ensure the following files are in the root directory:
Dockerfile(Critical: The app will fail without this)package.jsonvite.config.tsindex.htmlsrc/folder (containingApp.tsx, etc.)
Step 3: Configure API Key (For Gemini)
- In your Space, go to Settings.
- Scroll to Variables and secrets.
- Click New secret.
- Name:
API_KEY - Value: Your Google Gemini API Key.
π€ Local Qwen Setup Guide
If you have a powerful NVIDIA GPU (12GB+ VRAM recommended), you can run the captioning model locally for free and connect this web app to it.
Prerequisites
- OS: Windows or Linux
- GPU: NVIDIA GPU (CUDA support)
- Software: Python 3.10+ and CUDA Toolkit installed.
Step 1: Get the Script
- Open the LoRA Caption Assistant Web App.
- Under AI Provider, select Local Qwen (GPU).
- Select your desired model (e.g.,
Qwen 2.5 VL 7B). - Set your desired install folder path.
- Click Download Setup Script.
Step 2: Run the Server
- Locate the downloaded
.bat(Windows) or.sh(Linux) file. - Run it.
- The script will:
- Create a Python virtual environment.
- Install
vllm. - Download the selected Qwen model from Hugging Face.
- Start an OpenAI-compatible API server on port 8000.
Step 3: Connect to the App
Scenario A: Running App Locally (localhost)
- If you are running this web app on your own computer (
npm run dev), simply set the Endpoint in the app to:http://localhost:8000/v1
Scenario B: Running App on Hugging Face (HTTPS)
- If you are accessing the web app via Hugging Face Spaces, you cannot connect to
localhostdirectly due to browser security (Mixed Content Blocking). - You must create a secure tunnel.
How to Tunnel:
- Cloudflare Tunnel (Easiest):
- Download
cloudflared. - Run:
cloudflared tunnel --url http://localhost:8000 - Copy the URL ending in
.trycloudflare.com.
- Download
- Paste the URL:
- Paste this secure URL into the Local Endpoint field in the Web App.
- Add
/v1to the end (e.g.,https://example.trycloudflare.com/v1).
π» Local Development (Web App)
Prerequisites
- Node.js (v18+)
- npm
Installation
Clone the repo:
git clone <your-repo-url> cd lora-caption-assistantInstall dependencies:
npm installRun the app:
npm run devOpen
http://localhost:5173in your browser.
License
MIT