Spaces:

comfyuiman
/

loracaptionertaz

Running

App Files Files Community

loracaptionertaz / README.md

comfyuiman

Upload 20 files

cd3f86a verified about 1 month ago

preview code

raw

history blame contribute delete

4.15 kB


	---
	title: LoRA Caption Assistant
	emoji: 🖼️
	colorFrom: gray
	colorTo: indigo
	sdk: docker
	app_port: 7860
	---

	# LoRA Caption Assistant

	An AI-powered web application designed to assist in generating high-quality, detailed captions for image and video datasets. This tool is specifically tailored for training LoRA (Low-Rank Adaptation) models, utilizing Google's Gemini API or a Local Qwen Model (via vLLM) to automate the captioning process.

	## Features

	* Automated Captioning: Generates detailed, objective descriptions using Gemini 2.5 Pro or local Qwen-VL.
	* LoRA Optimized: Automatic trigger word insertion and style-agnostic descriptions.
	* Multi-Modal: Supports both image and video inputs.
	* Character Tagging: Optional automatic identification and tagging of specific characters.
	* Quality Assurance: AI-powered scoring system to evaluate caption quality (1-5 scale).
	* Batch Processing: Robust queue system with rate limiting (RPM) and batch sizes.
	* Export: Downloads the dataset (media + text files) as a ZIP file.

	---

	## 🚀 Deployment on Hugging Face Spaces

	This is the recommended way to run the application if you don't have a GPU.

	### Step 1: Create a Space
	1. Go to [Hugging Face Spaces](https://huggingface.co/spaces).
	2. Click Create new Space.
	3. Enter a name (e.g., `lora-caption-assistant`).
	4. Select Docker as the SDK.
	5. Choose "Blank" or "Public" template.
	6. Click Create Space.

	### Step 2: Upload Files
	Upload the contents of this repository to your Space. Ensure the following files are in the root directory:
	* `Dockerfile` (Critical: The app will fail without this)
	* `package.json`
	* `vite.config.ts`
	* `index.html`
	* `src/` folder (containing `App.tsx`, etc.)

	### Step 3: Configure API Key (For Gemini)
	1. In your Space, go to Settings.
	2. Scroll to Variables and secrets.
	3. Click New secret.
	4. Name: `API_KEY`
	5. Value: Your Google Gemini API Key.

	---

	## 🤖 Local Qwen Setup Guide

	If you have a powerful NVIDIA GPU (12GB+ VRAM recommended), you can run the captioning model locally for free and connect this web app to it.

	### Prerequisites
	* OS: Windows or Linux
	* GPU: NVIDIA GPU (CUDA support)
	* Software: Python 3.10+ and CUDA Toolkit installed.

	### Step 1: Get the Script
	1. Open the LoRA Caption Assistant Web App.
	2. Under AI Provider, select Local Qwen (GPU).
	3. Select your desired model (e.g., `Qwen 2.5 VL 7B`).
	4. Set your desired install folder path.
	5. Click Download Setup Script.

	### Step 2: Run the Server
	1. Locate the downloaded `.bat` (Windows) or `.sh` (Linux) file.
	2. Run it.
	3. The script will:
	* Create a Python virtual environment.
	* Install `vllm`.
	* Download the selected Qwen model from Hugging Face.
	* Start an OpenAI-compatible API server on port 8000.

	### Step 3: Connect to the App

	Scenario A: Running App Locally (localhost)
	* If you are running this web app on your own computer (`npm run dev`), simply set the Endpoint in the app to: `http://localhost:8000/v1`

	Scenario B: Running App on Hugging Face (HTTPS)
	* If you are accessing the web app via Hugging Face Spaces, you cannot connect to `localhost` directly due to browser security (Mixed Content Blocking).
	* You must create a secure tunnel.

	How to Tunnel:
	1. Cloudflare Tunnel (Easiest):
	* Download `cloudflared`.
	* Run: `cloudflared tunnel --url http://localhost:8000`
	* Copy the URL ending in `.trycloudflare.com`.
	2. Paste the URL:
	* Paste this secure URL into the Local Endpoint field in the Web App.
	* Add `/v1` to the end (e.g., `https://example.trycloudflare.com/v1`).

	---

	## 💻 Local Development (Web App)

	### Prerequisites
	* Node.js (v18+)
	* npm

	### Installation
	1. Clone the repo:
	```bash
	git clone <your-repo-url>
	cd lora-caption-assistant
	```

	2. Install dependencies:
	```bash
	npm install
	```

	3. Run the app:
	```bash
	npm run dev
	```
	Open `http://localhost:5173` in your browser.

	## License

	MIT