Prompt Augmentation Tool
This tool uses Qwen3 to generate new prompt pairs based on examples from the civitai_image.csv dataset.
Features
- Randomly samples existing prompts as examples for Qwen3
- 40% probability of generating multi-character focused prompts
- Cleans prompts by removing technical embeddings and prefixes
- Generates 10,000 new prompt pairs by default
- Batch saving every 10 prompts for safety against interruptions
- Resume capability - automatically detects and continues from existing files
- Saves results in JSONL format for easy processing
- Includes progress tracking and error handling
- Interrupt-safe - saves progress even if stopped with Ctrl+C
Usage
Basic Usage
cd prepare_tool/prompt_augmentation
python augment_prompts.py
Custom Parameters
python augment_prompts.py \
--target_count 5000 \
--multi_char_prob 0.3 \
--save_every 20 \
--output my_prompts.jsonl \
--csv_path ../../civitai_image.csv
Resume Generation
If your generation is interrupted, simply run the same command again. The script will automatically detect existing prompts and continue from where it left off:
# This will continue from existing prompts_10k.jsonl if it exists
python augment_prompts.py --target_count 10000 --output prompts_10k.jsonl
Parameters
--model: Model name (default: "Qwen/Qwen3-8B")--csv_path: Path to civitai CSV file (default: "../../civitai_image.csv")--target_count: Number of prompts to generate (default: 10000)--multi_char_prob: Probability of multi-character prompts (default: 0.4)--samples_per_batch: Number of examples to show model (default: 3)--save_every: Save to file every N successful generations (default: 10)--output: Output file name (default: "augmented_prompts.jsonl")
Output Format
The script generates a JSONL file where each line is a JSON object:
{
"positive_prompt": "detailed positive prompt here...",
"negative_prompt": "negative prompt with quality controls",
"multi_character_focus": true,
"generation_attempt": 42,
"sample_sources": ["sample 1...", "sample 2...", "sample 3..."]
}
Example Output
Each generated prompt pair will be similar to:
Multi-character focused:
{
"positive_prompt": "masterpiece, best quality, 2girls, sitting together on park bench, one girl with long brown hair reading book aloud, other girl with short blonde hair listening intently, warm afternoon sunlight, cherry blossoms falling, detailed facial expressions, friendship, casual clothing, peaceful atmosphere",
"negative_prompt": "worst quality, low quality, bad anatomy, bad hands, blurry, watermark, signature, text"
}
General prompt:
{
"positive_prompt": "high quality, detailed illustration, mystical forest scene, ancient stone ruins covered in glowing moss, ethereal lighting through canopy, magical atmosphere, fantasy landscape, intricate details, vibrant colors",
"negative_prompt": "low quality, bad anatomy, blurry, watermark, signature, worst quality"
}
Tips
- Monitor Progress: The script shows progress every 100 attempts
- Batch Saving: Results are saved every 10 successful generations by default
- Resume Safely: You can interrupt (Ctrl+C) and resume generation anytime
- Adjust Parameters: Lower
multi_char_probif you want fewer multi-character prompts - Change Batch Size: Use
--save_everyto control how often data is saved - GPU Memory: The script uses "auto" device mapping, ensure sufficient GPU memory
Requirements
- transformers
- torch
- pandas
- Python 3.7+
- Sufficient GPU memory for Qwen3-8B model