Prompt Augmentation Tool

This tool uses Qwen3 to generate new prompt pairs based on examples from the civitai_image.csv dataset.

Features

Randomly samples existing prompts as examples for Qwen3
40% probability of generating multi-character focused prompts
Cleans prompts by removing technical embeddings and prefixes
Generates 10,000 new prompt pairs by default
Batch saving every 10 prompts for safety against interruptions
Resume capability - automatically detects and continues from existing files
Saves results in JSONL format for easy processing
Includes progress tracking and error handling
Interrupt-safe - saves progress even if stopped with Ctrl+C

Usage

Basic Usage

cd prepare_tool/prompt_augmentation
python augment_prompts.py

Custom Parameters

python augment_prompts.py \
    --target_count 5000 \
    --multi_char_prob 0.3 \
    --save_every 20 \
    --output my_prompts.jsonl \
    --csv_path ../../civitai_image.csv

Resume Generation

If your generation is interrupted, simply run the same command again. The script will automatically detect existing prompts and continue from where it left off:

# This will continue from existing prompts_10k.jsonl if it exists
python augment_prompts.py --target_count 10000 --output prompts_10k.jsonl

Parameters

--model: Model name (default: "Qwen/Qwen3-8B")
--csv_path: Path to civitai CSV file (default: "../../civitai_image.csv")
--target_count: Number of prompts to generate (default: 10000)
--multi_char_prob: Probability of multi-character prompts (default: 0.4)
--samples_per_batch: Number of examples to show model (default: 3)
--save_every: Save to file every N successful generations (default: 10)
--output: Output file name (default: "augmented_prompts.jsonl")

Output Format

The script generates a JSONL file where each line is a JSON object:

{
    "positive_prompt": "detailed positive prompt here...",
    "negative_prompt": "negative prompt with quality controls",
    "multi_character_focus": true,
    "generation_attempt": 42,
    "sample_sources": ["sample 1...", "sample 2...", "sample 3..."]
}

Example Output

Each generated prompt pair will be similar to:

Multi-character focused:

{
    "positive_prompt": "masterpiece, best quality, 2girls, sitting together on park bench, one girl with long brown hair reading book aloud, other girl with short blonde hair listening intently, warm afternoon sunlight, cherry blossoms falling, detailed facial expressions, friendship, casual clothing, peaceful atmosphere",
    "negative_prompt": "worst quality, low quality, bad anatomy, bad hands, blurry, watermark, signature, text"
}

General prompt:

{
    "positive_prompt": "high quality, detailed illustration, mystical forest scene, ancient stone ruins covered in glowing moss, ethereal lighting through canopy, magical atmosphere, fantasy landscape, intricate details, vibrant colors",
    "negative_prompt": "low quality, bad anatomy, blurry, watermark, signature, worst quality"
}

Tips

Monitor Progress: The script shows progress every 100 attempts
Batch Saving: Results are saved every 10 successful generations by default
Resume Safely: You can interrupt (Ctrl+C) and resume generation anytime
Adjust Parameters: Lower multi_char_prob if you want fewer multi-character prompts
Change Batch Size: Use --save_every to control how often data is saved
GPU Memory: The script uses "auto" device mapping, ensure sufficient GPU memory

Requirements

transformers
torch
pandas
Python 3.7+
Sufficient GPU memory for Qwen3-8B model