loracaptionertaz / README.md
comfyuiman's picture
Upload 20 files
cd3f86a verified
metadata
title: LoRA Caption Assistant
emoji: πŸ–ΌοΈ
colorFrom: gray
colorTo: indigo
sdk: docker
app_port: 7860

LoRA Caption Assistant

An AI-powered web application designed to assist in generating high-quality, detailed captions for image and video datasets. This tool is specifically tailored for training LoRA (Low-Rank Adaptation) models, utilizing Google's Gemini API or a Local Qwen Model (via vLLM) to automate the captioning process.

Features

  • Automated Captioning: Generates detailed, objective descriptions using Gemini 2.5 Pro or local Qwen-VL.
  • LoRA Optimized: Automatic trigger word insertion and style-agnostic descriptions.
  • Multi-Modal: Supports both image and video inputs.
  • Character Tagging: Optional automatic identification and tagging of specific characters.
  • Quality Assurance: AI-powered scoring system to evaluate caption quality (1-5 scale).
  • Batch Processing: Robust queue system with rate limiting (RPM) and batch sizes.
  • Export: Downloads the dataset (media + text files) as a ZIP file.

πŸš€ Deployment on Hugging Face Spaces

This is the recommended way to run the application if you don't have a GPU.

Step 1: Create a Space

  1. Go to Hugging Face Spaces.
  2. Click Create new Space.
  3. Enter a name (e.g., lora-caption-assistant).
  4. Select Docker as the SDK.
  5. Choose "Blank" or "Public" template.
  6. Click Create Space.

Step 2: Upload Files

Upload the contents of this repository to your Space. Ensure the following files are in the root directory:

  • Dockerfile (Critical: The app will fail without this)
  • package.json
  • vite.config.ts
  • index.html
  • src/ folder (containing App.tsx, etc.)

Step 3: Configure API Key (For Gemini)

  1. In your Space, go to Settings.
  2. Scroll to Variables and secrets.
  3. Click New secret.
  4. Name: API_KEY
  5. Value: Your Google Gemini API Key.

πŸ€– Local Qwen Setup Guide

If you have a powerful NVIDIA GPU (12GB+ VRAM recommended), you can run the captioning model locally for free and connect this web app to it.

Prerequisites

  • OS: Windows or Linux
  • GPU: NVIDIA GPU (CUDA support)
  • Software: Python 3.10+ and CUDA Toolkit installed.

Step 1: Get the Script

  1. Open the LoRA Caption Assistant Web App.
  2. Under AI Provider, select Local Qwen (GPU).
  3. Select your desired model (e.g., Qwen 2.5 VL 7B).
  4. Set your desired install folder path.
  5. Click Download Setup Script.

Step 2: Run the Server

  1. Locate the downloaded .bat (Windows) or .sh (Linux) file.
  2. Run it.
  3. The script will:
    • Create a Python virtual environment.
    • Install vllm.
    • Download the selected Qwen model from Hugging Face.
    • Start an OpenAI-compatible API server on port 8000.

Step 3: Connect to the App

Scenario A: Running App Locally (localhost)

  • If you are running this web app on your own computer (npm run dev), simply set the Endpoint in the app to: http://localhost:8000/v1

Scenario B: Running App on Hugging Face (HTTPS)

  • If you are accessing the web app via Hugging Face Spaces, you cannot connect to localhost directly due to browser security (Mixed Content Blocking).
  • You must create a secure tunnel.

How to Tunnel:

  1. Cloudflare Tunnel (Easiest):
    • Download cloudflared.
    • Run: cloudflared tunnel --url http://localhost:8000
    • Copy the URL ending in .trycloudflare.com.
  2. Paste the URL:
    • Paste this secure URL into the Local Endpoint field in the Web App.
    • Add /v1 to the end (e.g., https://example.trycloudflare.com/v1).

πŸ’» Local Development (Web App)

Prerequisites

  • Node.js (v18+)
  • npm

Installation

  1. Clone the repo:

    git clone <your-repo-url>
    cd lora-caption-assistant
    
  2. Install dependencies:

    npm install
    
  3. Run the app:

    npm run dev
    

    Open http://localhost:5173 in your browser.

License

MIT