Upload folder using huggingface_hub

052244f verified 9 months ago

8.07 kB

T5-Base AI Art Prompt Generator

Model Version: 1.0
Training Date: August 2025
Base Model: google/t5-base (220M parameters)
Framework: Hugging Face Transformers 4.53.3

📊 Model Overview

This is a fine-tuned T5-base model specifically trained for AI art prompt generation and bidirectional prompt transformation. The model can both elaborate simple descriptions into detailed artistic prompts and simplify complex prompts into core concepts.

Key Capabilities

Simple-to-Elaborate: Transform basic descriptions into rich, detailed art prompts
Elaborate-to-Simple: Extract core concepts from complex prompts
Bidirectional: Handles both directions of prompt transformation
Multi-Platform: Trained on data from NightCafe, Civitai, and other AI art platforms

🏗️ Model Architecture

Base Architecture: T5 (Text-To-Text Transfer Transformer)

Parameters: 220,469,120 (220M)
Encoder Layers: 12
Decoder Layers: 12
Attention Heads: 12
Hidden Size: 768
Feed Forward: 3072
Vocabulary Size: 32,128 tokens
Max Sequence Length: 512 tokens

📈 Training Details

Dataset

Training Samples: 48,034 high-quality prompt pairs
Validation Samples: 5,338 samples
Sources: Multi-platform (NightCafe, Civitai, Community datasets)
Bias Protection: Implemented saturation limits to prevent "beautiful woman" oversaturation
Quality Filtering: Length-based, engagement-based, and metadata-based filtering

Training Configuration

Epochs: 5
Batch Size: 4 (per device)
Learning Rate: 1e-4 (0.0001)
Optimizer: AdamW
Final Training Loss: 0.3969
Final Validation Loss: 0.4293
Hardware: CUDA-enabled GPU training

Bias Protection System

The model was trained with strict bias protection limits:

Appearance descriptors: Max 5% ("beautiful", "gorgeous", etc.)
Gender representation: Balanced male/female ratios
Model diversity: Max 5K samples per AI model
Author diversity: Max 1K samples per creator

🎯 Performance Examples

Simple-to-Elaborate Transformation

Input: A cat sitting on a table
Output: A Millennial cat enjoying a newspaper by the window with a cup of tea nearby. The cat is wearing a cozy sweater and has a relaxed expression. The room is decorated with plants, books, and a cozy workspace.

Input: A futuristic city at night
Output: A futuristic cityscape at night, with towering skyscrapers piercing the night sky, illuminated by the soft glow of neon signs and holographic advertisements. The scene is reminiscent of Syd Mead's visionary cityscapes, with a touch of H.R. Giger's biomechanical horror, creating a mesmerizing and awe-inspiring scene.

Elaborate-to-Simple Transformation

Input: A majestic golden dragon soaring through storm clouds above a medieval castle, with lightning illuminating its scales in photorealistic detail
Output: A dragon flying over a castle with lightning in the background

🚀 Usage

Quick Start

from transformers import T5Tokenizer, T5ForConditionalGeneration

# Load model
tokenizer = T5Tokenizer.from_pretrained('./fine_tuned_t5_base')
model = T5ForConditionalGeneration.from_pretrained('./fine_tuned_t5_base')

# Generate elaborate prompt
input_text = "Generate a detailed artistic prompt for: cat on table"
inputs = tokenizer.encode(input_text, return_tensors='pt')
outputs = model.generate(inputs, max_length=256, num_beams=4)
result = tokenizer.decode(outputs[0], skip_special_tokens=True)

Using the Test Interface

# Interactive mode
python3 test_model.py --model fine_tuned_t5_base --interactive

# Batch testing  
python3 test_model.py --model fine_tuned_t5_base --batch

# Single transformations
python3 test_model.py --model fine_tuned_t5_base --elaborate "dragon in the sky"
python3 test_model.py --model fine_tuned_t5_base --simplify "hyperrealistic dragon..."

⚡ Performance Characteristics

Model Size vs Performance

Parameters: 220M (vs 60M T5-small)
Inference Speed: ~2.3x slower than T5-small
Output Quality: Significantly improved detail and coherence
Memory Usage: ~850MB GPU memory
CPU Inference: Suitable for real-time applications

Generation Parameters

Recommended Max Length: 256 tokens
Optimal Beam Search: 4 beams
Temperature: 1.0 (deterministic) or 1.1-1.3 (creative)
Do Sample: False for consistency, True for variety

🔧 Technical Specifications

Model Files

config.json: Model architecture configuration
model.safetensors: Model weights (850MB)
tokenizer_config.json: Tokenizer configuration
spiece.model: SentencePiece vocabulary
generation_config.json: Default generation parameters
training_info.txt: Training metrics and details

Hardware Requirements

Minimum: 2GB RAM, CPU-only inference possible
Recommended: 4GB GPU memory for optimal performance
Training: 8GB+ GPU memory (for further fine-tuning)

Compatibility

Transformers: 4.20.0+ (tested with 4.53.3)
PyTorch: 1.10.0+
Python: 3.8+
ONNX: Convertible for cross-platform deployment
OpenVINO: Compatible for Intel hardware acceleration

📊 Quality Metrics

Training Performance

Convergence: Smooth loss reduction over 5 epochs
Validation Stability: No significant overfitting observed
Loss Improvement: 63% reduction from initial to final loss

Output Quality Assessment

Coherence: High semantic consistency in generated prompts
Creativity: Balanced between variety and plausibility
Bias Control: Successfully maintains diversity targets
Length Appropriateness: Generates contextually appropriate detail levels

🎨 Use Cases

Primary Applications

AI Art Prompt Enhancement: Transform simple ideas into detailed prompts
Prompt Simplification: Extract core concepts from complex descriptions
Creative Writing: Generate artistic scene descriptions
Content Creation: Assist with visual storytelling
Educational: Teach prompt engineering principles

Integration Scenarios

Web Applications: Real-time prompt enhancement
Creative Tools: Plugin for art generation software
Content Pipelines: Automated prompt processing
Research: Prompt engineering and bias studies

⚠️ Limitations

Known Issues

Repetition: Occasionally generates repetitive LoRA tags (fixable with better filtering)
Context Overflow: Very long inputs may be truncated
Domain Specificity: Optimized for AI art, may not generalize to other domains
Training Data Bias: Despite protection, some biases may remain

Performance Considerations

Memory: Requires significant memory for batch processing
Speed: Slower than smaller models (T5-small)
Consistency: Deterministic generation may lack variety

🔄 Version History

v1.0 (August 2025)

Initial release with T5-base architecture
Multi-platform training data integration
Bias protection system implementation
48K+ training samples with quality filtering

📄 License & Attribution

Base Model: google/t5-base (Apache 2.0)
Training Data: Community sources (NightCafe, Civitai)
Fine-tuned Model: Open source research use
Commercial Use: Please verify platform ToS compliance

🙏 Acknowledgments

Google: T5 architecture and base model
Hugging Face: Transformers library and model hosting
NightCafe Studio: API access for training data
Civitai Community: Open model and prompt sharing
Community Contributors: Prompt creation and curation

🎨 Generate better AI art prompts with intelligent, bias-aware prompt transformation!

For issues, feature requests, or contributions, please see the main project repository.