loracaptionertaz / README.md
comfyuiman's picture
Upload 20 files
cd3f86a verified
---
title: LoRA Caption Assistant
emoji: 🖼️
colorFrom: gray
colorTo: indigo
sdk: docker
app_port: 7860
---
# LoRA Caption Assistant
An AI-powered web application designed to assist in generating high-quality, detailed captions for image and video datasets. This tool is specifically tailored for training LoRA (Low-Rank Adaptation) models, utilizing Google's Gemini API or a Local Qwen Model (via vLLM) to automate the captioning process.
## Features
* **Automated Captioning**: Generates detailed, objective descriptions using Gemini 2.5 Pro or local Qwen-VL.
* **LoRA Optimized**: Automatic trigger word insertion and style-agnostic descriptions.
* **Multi-Modal**: Supports both image and video inputs.
* **Character Tagging**: Optional automatic identification and tagging of specific characters.
* **Quality Assurance**: AI-powered scoring system to evaluate caption quality (1-5 scale).
* **Batch Processing**: Robust queue system with rate limiting (RPM) and batch sizes.
* **Export**: Downloads the dataset (media + text files) as a ZIP file.
---
## 🚀 Deployment on Hugging Face Spaces
This is the recommended way to run the application if you don't have a GPU.
### Step 1: Create a Space
1. Go to [Hugging Face Spaces](https://huggingface.co/spaces).
2. Click **Create new Space**.
3. Enter a name (e.g., `lora-caption-assistant`).
4. Select **Docker** as the SDK.
5. Choose "Blank" or "Public" template.
6. Click **Create Space**.
### Step 2: Upload Files
Upload the contents of this repository to your Space. Ensure the following files are in the **root** directory:
* `Dockerfile` (Critical: The app will fail without this)
* `package.json`
* `vite.config.ts`
* `index.html`
* `src/` folder (containing `App.tsx`, etc.)
### Step 3: Configure API Key (For Gemini)
1. In your Space, go to **Settings**.
2. Scroll to **Variables and secrets**.
3. Click **New secret**.
4. **Name**: `API_KEY`
5. **Value**: Your Google Gemini API Key.
---
## 🤖 Local Qwen Setup Guide
If you have a powerful NVIDIA GPU (12GB+ VRAM recommended), you can run the captioning model **locally for free** and connect this web app to it.
### Prerequisites
* **OS**: Windows or Linux
* **GPU**: NVIDIA GPU (CUDA support)
* **Software**: Python 3.10+ and CUDA Toolkit installed.
### Step 1: Get the Script
1. Open the LoRA Caption Assistant Web App.
2. Under **AI Provider**, select **Local Qwen (GPU)**.
3. Select your desired model (e.g., `Qwen 2.5 VL 7B`).
4. Set your desired install folder path.
5. Click **Download Setup Script**.
### Step 2: Run the Server
1. Locate the downloaded `.bat` (Windows) or `.sh` (Linux) file.
2. Run it.
3. The script will:
* Create a Python virtual environment.
* Install `vllm`.
* Download the selected Qwen model from Hugging Face.
* Start an OpenAI-compatible API server on port 8000.
### Step 3: Connect to the App
**Scenario A: Running App Locally (localhost)**
* If you are running this web app on your own computer (`npm run dev`), simply set the Endpoint in the app to: `http://localhost:8000/v1`
**Scenario B: Running App on Hugging Face (HTTPS)**
* If you are accessing the web app via Hugging Face Spaces, you **cannot** connect to `localhost` directly due to browser security (Mixed Content Blocking).
* You must create a secure tunnel.
**How to Tunnel:**
1. **Cloudflare Tunnel (Easiest)**:
* Download `cloudflared`.
* Run: `cloudflared tunnel --url http://localhost:8000`
* Copy the URL ending in `.trycloudflare.com`.
2. **Paste the URL**:
* Paste this secure URL into the **Local Endpoint** field in the Web App.
* Add `/v1` to the end (e.g., `https://example.trycloudflare.com/v1`).
---
## 💻 Local Development (Web App)
### Prerequisites
* Node.js (v18+)
* npm
### Installation
1. Clone the repo:
```bash
git clone <your-repo-url>
cd lora-caption-assistant
```
2. Install dependencies:
```bash
npm install
```
3. Run the app:
```bash
npm run dev
```
Open `http://localhost:5173` in your browser.
## License
MIT