T5-Base AI Art Prompt Generator
Model Version: 1.0
Training Date: August 2025
Base Model: google/t5-base (220M parameters)
Framework: Hugging Face Transformers 4.53.3
π Model Overview
This is a fine-tuned T5-base model specifically trained for AI art prompt generation and bidirectional prompt transformation. The model can both elaborate simple descriptions into detailed artistic prompts and simplify complex prompts into core concepts.
Key Capabilities
- Simple-to-Elaborate: Transform basic descriptions into rich, detailed art prompts
- Elaborate-to-Simple: Extract core concepts from complex prompts
- Bidirectional: Handles both directions of prompt transformation
- Multi-Platform: Trained on data from NightCafe, Civitai, and other AI art platforms
ποΈ Model Architecture
Base Architecture: T5 (Text-To-Text Transfer Transformer)
- Parameters: 220,469,120 (220M)
- Encoder Layers: 12
- Decoder Layers: 12
- Attention Heads: 12
- Hidden Size: 768
- Feed Forward: 3072
- Vocabulary Size: 32,128 tokens
- Max Sequence Length: 512 tokens
π Training Details
Dataset
- Training Samples: 48,034 high-quality prompt pairs
- Validation Samples: 5,338 samples
- Sources: Multi-platform (NightCafe, Civitai, Community datasets)
- Bias Protection: Implemented saturation limits to prevent "beautiful woman" oversaturation
- Quality Filtering: Length-based, engagement-based, and metadata-based filtering
Training Configuration
- Epochs: 5
- Batch Size: 4 (per device)
- Learning Rate: 1e-4 (0.0001)
- Optimizer: AdamW
- Final Training Loss: 0.3969
- Final Validation Loss: 0.4293
- Hardware: CUDA-enabled GPU training
Bias Protection System
The model was trained with strict bias protection limits:
- Appearance descriptors: Max 5% ("beautiful", "gorgeous", etc.)
- Gender representation: Balanced male/female ratios
- Model diversity: Max 5K samples per AI model
- Author diversity: Max 1K samples per creator
π― Performance Examples
Simple-to-Elaborate Transformation
Input: A cat sitting on a table
Output: A Millennial cat enjoying a newspaper by the window with a cup of tea nearby. The cat is wearing a cozy sweater and has a relaxed expression. The room is decorated with plants, books, and a cozy workspace.
Input: A futuristic city at night
Output: A futuristic cityscape at night, with towering skyscrapers piercing the night sky, illuminated by the soft glow of neon signs and holographic advertisements. The scene is reminiscent of Syd Mead's visionary cityscapes, with a touch of H.R. Giger's biomechanical horror, creating a mesmerizing and awe-inspiring scene.
Elaborate-to-Simple Transformation
Input: A majestic golden dragon soaring through storm clouds above a medieval castle, with lightning illuminating its scales in photorealistic detail
Output: A dragon flying over a castle with lightning in the background
π Usage
Quick Start
from transformers import T5Tokenizer, T5ForConditionalGeneration
# Load model
tokenizer = T5Tokenizer.from_pretrained('./fine_tuned_t5_base')
model = T5ForConditionalGeneration.from_pretrained('./fine_tuned_t5_base')
# Generate elaborate prompt
input_text = "Generate a detailed artistic prompt for: cat on table"
inputs = tokenizer.encode(input_text, return_tensors='pt')
outputs = model.generate(inputs, max_length=256, num_beams=4)
result = tokenizer.decode(outputs[0], skip_special_tokens=True)
Using the Test Interface
# Interactive mode
python3 test_model.py --model fine_tuned_t5_base --interactive
# Batch testing
python3 test_model.py --model fine_tuned_t5_base --batch
# Single transformations
python3 test_model.py --model fine_tuned_t5_base --elaborate "dragon in the sky"
python3 test_model.py --model fine_tuned_t5_base --simplify "hyperrealistic dragon..."
β‘ Performance Characteristics
Model Size vs Performance
- Parameters: 220M (vs 60M T5-small)
- Inference Speed: ~2.3x slower than T5-small
- Output Quality: Significantly improved detail and coherence
- Memory Usage: ~850MB GPU memory
- CPU Inference: Suitable for real-time applications
Generation Parameters
- Recommended Max Length: 256 tokens
- Optimal Beam Search: 4 beams
- Temperature: 1.0 (deterministic) or 1.1-1.3 (creative)
- Do Sample: False for consistency, True for variety
π§ Technical Specifications
Model Files
config.json: Model architecture configurationmodel.safetensors: Model weights (850MB)tokenizer_config.json: Tokenizer configurationspiece.model: SentencePiece vocabularygeneration_config.json: Default generation parameterstraining_info.txt: Training metrics and details
Hardware Requirements
- Minimum: 2GB RAM, CPU-only inference possible
- Recommended: 4GB GPU memory for optimal performance
- Training: 8GB+ GPU memory (for further fine-tuning)
Compatibility
- Transformers: 4.20.0+ (tested with 4.53.3)
- PyTorch: 1.10.0+
- Python: 3.8+
- ONNX: Convertible for cross-platform deployment
- OpenVINO: Compatible for Intel hardware acceleration
π Quality Metrics
Training Performance
- Convergence: Smooth loss reduction over 5 epochs
- Validation Stability: No significant overfitting observed
- Loss Improvement: 63% reduction from initial to final loss
Output Quality Assessment
- Coherence: High semantic consistency in generated prompts
- Creativity: Balanced between variety and plausibility
- Bias Control: Successfully maintains diversity targets
- Length Appropriateness: Generates contextually appropriate detail levels
π¨ Use Cases
Primary Applications
- AI Art Prompt Enhancement: Transform simple ideas into detailed prompts
- Prompt Simplification: Extract core concepts from complex descriptions
- Creative Writing: Generate artistic scene descriptions
- Content Creation: Assist with visual storytelling
- Educational: Teach prompt engineering principles
Integration Scenarios
- Web Applications: Real-time prompt enhancement
- Creative Tools: Plugin for art generation software
- Content Pipelines: Automated prompt processing
- Research: Prompt engineering and bias studies
β οΈ Limitations
Known Issues
- Repetition: Occasionally generates repetitive LoRA tags (fixable with better filtering)
- Context Overflow: Very long inputs may be truncated
- Domain Specificity: Optimized for AI art, may not generalize to other domains
- Training Data Bias: Despite protection, some biases may remain
Performance Considerations
- Memory: Requires significant memory for batch processing
- Speed: Slower than smaller models (T5-small)
- Consistency: Deterministic generation may lack variety
π Version History
v1.0 (August 2025)
- Initial release with T5-base architecture
- Multi-platform training data integration
- Bias protection system implementation
- 48K+ training samples with quality filtering
π License & Attribution
- Base Model: google/t5-base (Apache 2.0)
- Training Data: Community sources (NightCafe, Civitai)
- Fine-tuned Model: Open source research use
- Commercial Use: Please verify platform ToS compliance
π Acknowledgments
- Google: T5 architecture and base model
- Hugging Face: Transformers library and model hosting
- NightCafe Studio: API access for training data
- Civitai Community: Open model and prompt sharing
- Community Contributors: Prompt creation and curation
π¨ Generate better AI art prompts with intelligent, bias-aware prompt transformation!
For issues, feature requests, or contributions, please see the main project repository.