Spaces:
Running
Running
| title: LoRA Caption Assistant | |
| emoji: 🖼️ | |
| colorFrom: gray | |
| colorTo: indigo | |
| sdk: docker | |
| app_port: 7860 | |
| # LoRA Caption Assistant | |
| An AI-powered web application designed to assist in generating high-quality, detailed captions for image and video datasets. This tool is specifically tailored for training LoRA (Low-Rank Adaptation) models, utilizing Google's Gemini API or a Local Qwen Model (via vLLM) to automate the captioning process. | |
| ## Features | |
| * **Automated Captioning**: Generates detailed, objective descriptions using Gemini 2.5 Pro or local Qwen-VL. | |
| * **LoRA Optimized**: Automatic trigger word insertion and style-agnostic descriptions. | |
| * **Multi-Modal**: Supports both image and video inputs. | |
| * **Character Tagging**: Optional automatic identification and tagging of specific characters. | |
| * **Quality Assurance**: AI-powered scoring system to evaluate caption quality (1-5 scale). | |
| * **Batch Processing**: Robust queue system with rate limiting (RPM) and batch sizes. | |
| * **Export**: Downloads the dataset (media + text files) as a ZIP file. | |
| --- | |
| ## 🚀 Deployment on Hugging Face Spaces | |
| This is the recommended way to run the application if you don't have a GPU. | |
| ### Step 1: Create a Space | |
| 1. Go to [Hugging Face Spaces](https://huggingface.co/spaces). | |
| 2. Click **Create new Space**. | |
| 3. Enter a name (e.g., `lora-caption-assistant`). | |
| 4. Select **Docker** as the SDK. | |
| 5. Choose "Blank" or "Public" template. | |
| 6. Click **Create Space**. | |
| ### Step 2: Upload Files | |
| Upload the contents of this repository to your Space. Ensure the following files are in the **root** directory: | |
| * `Dockerfile` (Critical: The app will fail without this) | |
| * `package.json` | |
| * `vite.config.ts` | |
| * `index.html` | |
| * `src/` folder (containing `App.tsx`, etc.) | |
| ### Step 3: Configure API Key (For Gemini) | |
| 1. In your Space, go to **Settings**. | |
| 2. Scroll to **Variables and secrets**. | |
| 3. Click **New secret**. | |
| 4. **Name**: `API_KEY` | |
| 5. **Value**: Your Google Gemini API Key. | |
| --- | |
| ## 🤖 Local Qwen Setup Guide | |
| If you have a powerful NVIDIA GPU (12GB+ VRAM recommended), you can run the captioning model **locally for free** and connect this web app to it. | |
| ### Prerequisites | |
| * **OS**: Windows or Linux | |
| * **GPU**: NVIDIA GPU (CUDA support) | |
| * **Software**: Python 3.10+ and CUDA Toolkit installed. | |
| ### Step 1: Get the Script | |
| 1. Open the LoRA Caption Assistant Web App. | |
| 2. Under **AI Provider**, select **Local Qwen (GPU)**. | |
| 3. Select your desired model (e.g., `Qwen 2.5 VL 7B`). | |
| 4. Set your desired install folder path. | |
| 5. Click **Download Setup Script**. | |
| ### Step 2: Run the Server | |
| 1. Locate the downloaded `.bat` (Windows) or `.sh` (Linux) file. | |
| 2. Run it. | |
| 3. The script will: | |
| * Create a Python virtual environment. | |
| * Install `vllm`. | |
| * Download the selected Qwen model from Hugging Face. | |
| * Start an OpenAI-compatible API server on port 8000. | |
| ### Step 3: Connect to the App | |
| **Scenario A: Running App Locally (localhost)** | |
| * If you are running this web app on your own computer (`npm run dev`), simply set the Endpoint in the app to: `http://localhost:8000/v1` | |
| **Scenario B: Running App on Hugging Face (HTTPS)** | |
| * If you are accessing the web app via Hugging Face Spaces, you **cannot** connect to `localhost` directly due to browser security (Mixed Content Blocking). | |
| * You must create a secure tunnel. | |
| **How to Tunnel:** | |
| 1. **Cloudflare Tunnel (Easiest)**: | |
| * Download `cloudflared`. | |
| * Run: `cloudflared tunnel --url http://localhost:8000` | |
| * Copy the URL ending in `.trycloudflare.com`. | |
| 2. **Paste the URL**: | |
| * Paste this secure URL into the **Local Endpoint** field in the Web App. | |
| * Add `/v1` to the end (e.g., `https://example.trycloudflare.com/v1`). | |
| --- | |
| ## 💻 Local Development (Web App) | |
| ### Prerequisites | |
| * Node.js (v18+) | |
| * npm | |
| ### Installation | |
| 1. Clone the repo: | |
| ```bash | |
| git clone <your-repo-url> | |
| cd lora-caption-assistant | |
| ``` | |
| 2. Install dependencies: | |
| ```bash | |
| npm install | |
| ``` | |
| 3. Run the app: | |
| ```bash | |
| npm run dev | |
| ``` | |
| Open `http://localhost:5173` in your browser. | |
| ## License | |
| MIT | |