Spaces:

comfyuiman
/

loracaptionertaz

Running

App Files Files Community

loracaptionertaz / README.md

comfyuiman

Upload 20 files

cd3f86a verified about 1 month ago

preview code

raw

history blame contribute delete

4.15 kB

metadata

title: LoRA Caption Assistant
emoji: 🖼️
colorFrom: gray
colorTo: indigo
sdk: docker
app_port: 7860

LoRA Caption Assistant

An AI-powered web application designed to assist in generating high-quality, detailed captions for image and video datasets. This tool is specifically tailored for training LoRA (Low-Rank Adaptation) models, utilizing Google's Gemini API or a Local Qwen Model (via vLLM) to automate the captioning process.

Features

Automated Captioning: Generates detailed, objective descriptions using Gemini 2.5 Pro or local Qwen-VL.
LoRA Optimized: Automatic trigger word insertion and style-agnostic descriptions.
Multi-Modal: Supports both image and video inputs.
Character Tagging: Optional automatic identification and tagging of specific characters.
Quality Assurance: AI-powered scoring system to evaluate caption quality (1-5 scale).
Batch Processing: Robust queue system with rate limiting (RPM) and batch sizes.
Export: Downloads the dataset (media + text files) as a ZIP file.

🚀 Deployment on Hugging Face Spaces

This is the recommended way to run the application if you don't have a GPU.

Step 1: Create a Space

Go to Hugging Face Spaces.
Click Create new Space.
Enter a name (e.g., lora-caption-assistant).
Select Docker as the SDK.
Choose "Blank" or "Public" template.
Click Create Space.

Step 2: Upload Files

Upload the contents of this repository to your Space. Ensure the following files are in the root directory:

Dockerfile (Critical: The app will fail without this)
package.json
vite.config.ts
index.html
src/ folder (containing App.tsx, etc.)

Step 3: Configure API Key (For Gemini)

In your Space, go to Settings.
Scroll to Variables and secrets.
Click New secret.
Name: API_KEY
Value: Your Google Gemini API Key.

🤖 Local Qwen Setup Guide

If you have a powerful NVIDIA GPU (12GB+ VRAM recommended), you can run the captioning model locally for free and connect this web app to it.

Prerequisites

OS: Windows or Linux
GPU: NVIDIA GPU (CUDA support)
Software: Python 3.10+ and CUDA Toolkit installed.

Step 1: Get the Script

Open the LoRA Caption Assistant Web App.
Under AI Provider, select Local Qwen (GPU).
Select your desired model (e.g., Qwen 2.5 VL 7B).
Set your desired install folder path.
Click Download Setup Script.

Step 2: Run the Server

Locate the downloaded .bat (Windows) or .sh (Linux) file.
Run it.
The script will:
- Create a Python virtual environment.
- Install vllm.
- Download the selected Qwen model from Hugging Face.
- Start an OpenAI-compatible API server on port 8000.

Step 3: Connect to the App

Scenario A: Running App Locally (localhost)

If you are running this web app on your own computer (npm run dev), simply set the Endpoint in the app to: http://localhost:8000/v1

Scenario B: Running App on Hugging Face (HTTPS)

If you are accessing the web app via Hugging Face Spaces, you cannot connect to localhost directly due to browser security (Mixed Content Blocking).
You must create a secure tunnel.

How to Tunnel:

Cloudflare Tunnel (Easiest):
- Download cloudflared.
- Run: cloudflared tunnel --url http://localhost:8000
- Copy the URL ending in .trycloudflare.com.
Paste the URL:
- Paste this secure URL into the Local Endpoint field in the Web App.
- Add /v1 to the end (e.g., https://example.trycloudflare.com/v1).

💻 Local Development (Web App)

Prerequisites

Node.js (v18+)
npm

Installation

Clone the repo:

git clone <your-repo-url>
cd lora-caption-assistant

Install dependencies:
```
npm install
```
Run the app:
```
npm run dev
```
Open http://localhost:5173 in your browser.

License

MIT