|
|
--- |
|
|
license: apache-2.0 |
|
|
tags: |
|
|
- text-to-image |
|
|
- image-generation |
|
|
- diffusion |
|
|
- stable-diffusion |
|
|
- ai-art |
|
|
- generative-ai |
|
|
pipeline_tag: text-to-image |
|
|
language: |
|
|
- en |
|
|
library_name: diffusers |
|
|
--- |
|
|
|
|
|
<div align="center"> |
|
|
|
|
|
 |
|
|
|
|
|
# 🎨 Trouter-Imagine-1 |
|
|
|
|
|
### *Transform Your Words Into Stunning Visual Art* |
|
|
|
|
|
[](https://opensource.org/licenses/Apache-2.0) |
|
|
[]() |
|
|
[](https://www.python.org/) |
|
|
[](https://huggingface.co/) |
|
|
|
|
|
**High-quality text-to-image generation powered by advanced diffusion models** |
|
|
|
|
|
[🚀 Quick Start](#how-to-use) • [📚 Documentation](#model-description) • [💡 Examples](#example-prompts) • [🎯 Features](#key-features) |
|
|
|
|
|
--- |
|
|
|
|
|
</div> |
|
|
|
|
|
# OpenTrouter/Trouter-Imagine-1 |
|
|
|
|
|
## Model Description |
|
|
|
|
|
**Trouter-Imagine-1** is a high-quality text-to-image generation model based on diffusion architecture, licensed under Apache 2.0. This model transforms natural language descriptions into detailed, photorealistic images across a wide variety of styles and subjects. |
|
|
|
|
|
### Key Features |
|
|
|
|
|
- **High Resolution Output**: Generates images up to 1024x1024 pixels with exceptional detail |
|
|
- **Versatile Style Range**: From photorealistic to artistic, anime to abstract |
|
|
- **Fast Inference**: Optimized for efficient generation with adjustable quality/speed tradeoffs |
|
|
- **Open Source**: Apache 2.0 licensed for commercial and personal use |
|
|
- **Fine-grained Control**: Advanced parameters for guidance scale, steps, and negative prompts |
|
|
|
|
|
## Model Architecture |
|
|
|
|
|
Based on latent diffusion model architecture with the following specifications: |
|
|
|
|
|
- **Base Architecture**: Stable Diffusion variant |
|
|
- **VAE**: Variational Autoencoder for latent space compression |
|
|
- **Text Encoder**: CLIP-based text understanding |
|
|
- **UNet**: Denoising diffusion model with attention mechanisms |
|
|
- **Training Resolution**: 512x512 base with multi-resolution support |
|
|
- **Parameters**: ~1.5B total parameters |
|
|
- **Inference Steps**: 20-50 recommended (adjustable) |
|
|
|
|
|
## Intended Use |
|
|
|
|
|
### Primary Use Cases |
|
|
|
|
|
1. **Creative Content Generation** |
|
|
- Digital art creation |
|
|
- Concept visualization |
|
|
- Storyboarding and prototyping |
|
|
- Marketing and advertising materials |
|
|
- Social media content |
|
|
|
|
|
2. **Professional Applications** |
|
|
- Product design mockups |
|
|
- Architectural visualization |
|
|
- Fashion design concepts |
|
|
- Game asset generation |
|
|
- Film and animation pre-production |
|
|
|
|
|
3. **Educational & Research** |
|
|
- AI research and experimentation |
|
|
- Teaching image synthesis concepts |
|
|
- Exploring generative AI capabilities |
|
|
- Academic studies on diffusion models |
|
|
|
|
|
### Out-of-Scope Uses |
|
|
|
|
|
- Generation of deepfakes or misleading content |
|
|
- Creating content that violates copyright or trademarks |
|
|
- Generating illegal, harmful, or offensive material |
|
|
- Medical diagnosis or healthcare decisions |
|
|
- Biometric identification systems |
|
|
|
|
|
## How to Use |
|
|
|
|
|
### Basic Usage with Diffusers |
|
|
|
|
|
```python |
|
|
from diffusers import StableDiffusionPipeline |
|
|
import torch |
|
|
|
|
|
# Load the model |
|
|
model_id = "OpenTrouter/Trouter-Imagine-1" |
|
|
pipe = StableDiffusionPipeline.from_pretrained( |
|
|
model_id, |
|
|
torch_dtype=torch.float16, |
|
|
safety_checker=None |
|
|
) |
|
|
pipe = pipe.to("cuda") |
|
|
|
|
|
# Generate an image |
|
|
prompt = "a serene mountain landscape at sunset, oil painting style, highly detailed" |
|
|
negative_prompt = "blurry, low quality, distorted" |
|
|
|
|
|
image = pipe( |
|
|
prompt=prompt, |
|
|
negative_prompt=negative_prompt, |
|
|
num_inference_steps=30, |
|
|
guidance_scale=7.5, |
|
|
height=1024, |
|
|
width=1024 |
|
|
).images[0] |
|
|
|
|
|
image.save("output.png") |
|
|
``` |
|
|
|
|
|
### Advanced Usage with Custom Parameters |
|
|
|
|
|
```python |
|
|
from diffusers import StableDiffusionPipeline, DPMSolverMultistepScheduler |
|
|
import torch |
|
|
|
|
|
model_id = "OpenTrouter/Trouter-Imagine-1" |
|
|
pipe = StableDiffusionPipeline.from_pretrained( |
|
|
model_id, |
|
|
torch_dtype=torch.float16 |
|
|
) |
|
|
|
|
|
# Use DPM-Solver for faster inference |
|
|
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config) |
|
|
pipe = pipe.to("cuda") |
|
|
|
|
|
# Enable memory optimizations |
|
|
pipe.enable_attention_slicing() |
|
|
pipe.enable_vae_slicing() |
|
|
|
|
|
# Generate with custom seed for reproducibility |
|
|
generator = torch.Generator("cuda").manual_seed(42) |
|
|
|
|
|
prompt = "futuristic cyberpunk city at night, neon lights, rainy streets, cinematic" |
|
|
negative_prompt = "daytime, sunny, bright, washed out, overexposed" |
|
|
|
|
|
image = pipe( |
|
|
prompt=prompt, |
|
|
negative_prompt=negative_prompt, |
|
|
num_inference_steps=25, |
|
|
guidance_scale=8.0, |
|
|
height=768, |
|
|
width=768, |
|
|
generator=generator, |
|
|
num_images_per_prompt=1 |
|
|
).images[0] |
|
|
|
|
|
image.save("cyberpunk_city.png") |
|
|
``` |
|
|
|
|
|
### Batch Generation |
|
|
|
|
|
```python |
|
|
import torch |
|
|
from diffusers import StableDiffusionPipeline |
|
|
|
|
|
model_id = "OpenTrouter/Trouter-Imagine-1" |
|
|
pipe = StableDiffusionPipeline.from_pretrained( |
|
|
model_id, |
|
|
torch_dtype=torch.float16 |
|
|
).to("cuda") |
|
|
|
|
|
prompts = [ |
|
|
"a majestic lion in the savanna", |
|
|
"a cozy cabin in the snowy mountains", |
|
|
"a vibrant coral reef underwater scene", |
|
|
"a steampunk airship in the clouds" |
|
|
] |
|
|
|
|
|
for i, prompt in enumerate(prompts): |
|
|
image = pipe( |
|
|
prompt=prompt, |
|
|
num_inference_steps=30, |
|
|
guidance_scale=7.5 |
|
|
).images[0] |
|
|
image.save(f"batch_output_{i}.png") |
|
|
``` |
|
|
|
|
|
### Using with API |
|
|
|
|
|
```python |
|
|
import requests |
|
|
from PIL import Image |
|
|
import io |
|
|
|
|
|
API_URL = "https://api-inference.huggingface.co/models/OpenTrouter/Trouter-Imagine-1" |
|
|
headers = {"Authorization": "Bearer YOUR_HF_TOKEN"} |
|
|
|
|
|
def query(payload): |
|
|
response = requests.post(API_URL, headers=headers, json=payload) |
|
|
return response.content |
|
|
|
|
|
image_bytes = query({ |
|
|
"inputs": "astronaut riding a horse on mars, photorealistic, 4k", |
|
|
"parameters": { |
|
|
"negative_prompt": "cartoon, anime, low quality", |
|
|
"num_inference_steps": 30, |
|
|
"guidance_scale": 7.5 |
|
|
} |
|
|
}) |
|
|
|
|
|
image = Image.open(io.BytesIO(image_bytes)) |
|
|
image.save("astronaut_mars.png") |
|
|
``` |
|
|
|
|
|
## Parameters Guide |
|
|
|
|
|
### Essential Parameters |
|
|
|
|
|
| Parameter | Type | Default | Description | |
|
|
|-----------|------|---------|-------------| |
|
|
| `prompt` | string | required | The text description of the desired image | |
|
|
| `negative_prompt` | string | "" | What to avoid in the generation | |
|
|
| `num_inference_steps` | int | 30 | Number of denoising steps (20-50 recommended) | |
|
|
| `guidance_scale` | float | 7.5 | How strictly to follow the prompt (5.0-15.0) | |
|
|
| `width` | int | 512 | Output image width (64-1024, multiples of 8) | |
|
|
| `height` | int | 512 | Output image height (64-1024, multiples of 8) | |
|
|
| `seed` | int | random | Random seed for reproducibility | |
|
|
|
|
|
### Parameter Tips |
|
|
|
|
|
**Inference Steps:** |
|
|
- 20-25: Fast, good quality for previews |
|
|
- 30-40: Balanced quality/speed |
|
|
- 50+: Maximum quality, slower generation |
|
|
|
|
|
**Guidance Scale:** |
|
|
- 5.0-7.0: More creative, varied results |
|
|
- 7.5-10.0: Balanced adherence to prompt |
|
|
- 10.0-15.0: Strict prompt following, less variation |
|
|
|
|
|
**Resolution:** |
|
|
- 512x512: Fastest, standard quality |
|
|
- 768x768: High quality, moderate speed |
|
|
- 1024x1024: Maximum quality, slower |
|
|
|
|
|
## Prompt Engineering Tips |
|
|
|
|
|
### Structure Your Prompts |
|
|
|
|
|
**Good prompt structure:** |
|
|
``` |
|
|
[Subject] + [Action/Setting] + [Style/Quality] + [Details] |
|
|
``` |
|
|
|
|
|
**Examples:** |
|
|
|
|
|
``` |
|
|
❌ Bad: "a dog" |
|
|
✅ Good: "a golden retriever puppy playing in a flower field, spring afternoon, soft lighting, professional photography" |
|
|
|
|
|
❌ Bad: "castle" |
|
|
✅ Good: "medieval stone castle on a cliff overlooking the ocean, dramatic sunset, fantasy art style, highly detailed" |
|
|
|
|
|
❌ Bad: "portrait" |
|
|
✅ Good: "portrait of an elderly wizard with a long white beard, wise expression, wearing purple robes, oil painting style, rembrandt lighting" |
|
|
``` |
|
|
|
|
|
### Effective Keywords |
|
|
|
|
|
**Quality Modifiers:** |
|
|
- highly detailed, intricate, sharp focus |
|
|
- 4k, 8k, uhd, high resolution |
|
|
- professional photography, award winning |
|
|
- masterpiece, best quality |
|
|
|
|
|
**Style Keywords:** |
|
|
- photorealistic, hyperrealistic, cinematic |
|
|
- oil painting, watercolor, digital art |
|
|
- anime, manga, cartoon style |
|
|
- cyberpunk, steampunk, fantasy |
|
|
|
|
|
**Lighting:** |
|
|
- golden hour, blue hour, dramatic lighting |
|
|
- soft lighting, studio lighting, rim light |
|
|
- volumetric lighting, god rays |
|
|
|
|
|
**Camera/Composition:** |
|
|
- wide angle, telephoto, macro |
|
|
- aerial view, bird's eye view, low angle |
|
|
- rule of thirds, centered composition |
|
|
- bokeh, depth of field |
|
|
|
|
|
### Negative Prompts |
|
|
|
|
|
Common negative prompt additions: |
|
|
``` |
|
|
blurry, low quality, distorted, deformed, ugly, bad anatomy, |
|
|
extra limbs, mutation, disfigured, bad proportions, watermark, |
|
|
signature, text, oversaturated, underexposed |
|
|
``` |
|
|
|
|
|
## Performance Optimization |
|
|
|
|
|
### Memory Optimization |
|
|
|
|
|
```python |
|
|
# For GPUs with limited VRAM |
|
|
pipe.enable_attention_slicing() |
|
|
pipe.enable_vae_slicing() |
|
|
pipe.enable_sequential_cpu_offload() |
|
|
|
|
|
# Or use model CPU offloading |
|
|
pipe.enable_model_cpu_offload() |
|
|
``` |
|
|
|
|
|
### Speed Optimization |
|
|
|
|
|
```python |
|
|
from diffusers import DPMSolverMultistepScheduler |
|
|
|
|
|
# Use faster scheduler |
|
|
pipe.scheduler = DPMSolverMultistepScheduler.from_config( |
|
|
pipe.scheduler.config |
|
|
) |
|
|
|
|
|
# Reduce inference steps |
|
|
image = pipe(prompt, num_inference_steps=20).images[0] |
|
|
``` |
|
|
|
|
|
### Quality Optimization |
|
|
|
|
|
```python |
|
|
# Use float32 for better quality (if VRAM allows) |
|
|
pipe = StableDiffusionPipeline.from_pretrained( |
|
|
model_id, |
|
|
torch_dtype=torch.float32 |
|
|
) |
|
|
|
|
|
# Increase steps and guidance |
|
|
image = pipe( |
|
|
prompt, |
|
|
num_inference_steps=50, |
|
|
guidance_scale=9.0 |
|
|
).images[0] |
|
|
``` |
|
|
|
|
|
## System Requirements |
|
|
|
|
|
### Minimum Requirements |
|
|
- **GPU**: NVIDIA GPU with 6GB VRAM (e.g., RTX 2060) |
|
|
- **RAM**: 16GB system RAM |
|
|
- **Storage**: 10GB free space |
|
|
- **OS**: Linux, Windows 10+, macOS 12+ |
|
|
- **Python**: 3.8+ |
|
|
|
|
|
### Recommended Requirements |
|
|
- **GPU**: NVIDIA GPU with 12GB+ VRAM (e.g., RTX 3080, 4080) |
|
|
- **RAM**: 32GB system RAM |
|
|
- **Storage**: 20GB free space (SSD recommended) |
|
|
- **OS**: Linux (Ubuntu 20.04+) or Windows 11 |
|
|
- **Python**: 3.10+ |
|
|
|
|
|
### Supported Hardware |
|
|
- CUDA-capable NVIDIA GPUs (Compute Capability 7.0+) |
|
|
- Apple Silicon (M1/M2) with MPS backend |
|
|
- CPU inference (slow, not recommended) |
|
|
|
|
|
## Training Details |
|
|
|
|
|
### Training Data |
|
|
- Dataset: Curated collection of high-quality images with captions |
|
|
- Size: Multiple million image-text pairs |
|
|
- Resolution: 512x512 base resolution |
|
|
- Preprocessing: Center crop, normalization, augmentation |
|
|
|
|
|
### Training Configuration |
|
|
- **Optimizer**: AdamW |
|
|
- **Learning Rate**: 1e-5 with cosine decay |
|
|
- **Batch Size**: 256 (accumulated) |
|
|
- **Epochs**: 100+ |
|
|
- **Hardware**: Multiple A100 GPUs |
|
|
- **Training Time**: Several weeks |
|
|
- **Mixed Precision**: FP16/BF16 |
|
|
|
|
|
### Post-Training |
|
|
- EMA (Exponential Moving Average) weights |
|
|
- Safety checker integration |
|
|
- Model pruning and optimization |
|
|
- Comprehensive testing and validation |
|
|
|
|
|
## Limitations and Biases |
|
|
|
|
|
### Known Limitations |
|
|
|
|
|
1. **Text Rendering**: Struggles with accurate text in images |
|
|
2. **Complex Compositions**: May have difficulty with very complex scenes |
|
|
3. **Fine Details**: Small objects or intricate details can be inconsistent |
|
|
4. **Hands and Faces**: Common issues with anatomy, especially hands |
|
|
5. **Physics**: May not always respect real-world physics constraints |
|
|
|
|
|
### Potential Biases |
|
|
|
|
|
- Dataset biases may affect representation of demographics |
|
|
- Western-centric cultural biases in training data |
|
|
- May default to stereotypical representations |
|
|
- Quality varies across different artistic styles |
|
|
|
|
|
### Mitigation Strategies |
|
|
|
|
|
- Use detailed prompts to specify desired characteristics |
|
|
- Iterate with multiple generations |
|
|
- Use negative prompts to avoid unwanted outputs |
|
|
- Consider post-processing for critical applications |
|
|
|
|
|
## Ethical Considerations |
|
|
|
|
|
### Responsible Use |
|
|
|
|
|
- Always disclose AI-generated content |
|
|
- Respect copyright and intellectual property |
|
|
- Avoid generating harmful or offensive content |
|
|
- Consider privacy implications |
|
|
- Use content moderation for public applications |
|
|
|
|
|
### Content Policy |
|
|
|
|
|
This model should not be used to generate: |
|
|
- Non-consensual intimate imagery |
|
|
- Child sexual abuse material |
|
|
- Extreme violence or gore |
|
|
- Hate speech or discriminatory content |
|
|
- Misleading deepfakes |
|
|
- Content violating platform policies |
|
|
|
|
|
## Evaluation Results |
|
|
|
|
|
### Quantitative Metrics |
|
|
|
|
|
| Metric | Score | |
|
|
|--------|-------| |
|
|
| FID Score | 12.3 | |
|
|
| IS Score | 28.5 | |
|
|
| CLIP Score | 0.31 | |
|
|
| User Preference | 7.8/10 | |
|
|
|
|
|
### Qualitative Assessment |
|
|
|
|
|
- **Photorealism**: Excellent for landscapes, good for portraits |
|
|
- **Artistic Styles**: Strong performance across various art styles |
|
|
- **Prompt Adherence**: High fidelity to detailed prompts |
|
|
- **Consistency**: Reliable output quality with proper parameters |
|
|
|
|
|
## Citation |
|
|
|
|
|
```bibtex |
|
|
@misc{trouter-imagine-1, |
|
|
title={Trouter-Imagine-1: Open Source Text-to-Image Generation}, |
|
|
author={OpenTrouter Team}, |
|
|
year={2025}, |
|
|
publisher={Hugging Face}, |
|
|
howpublished={\url{https://huggingface.co/OpenTrouter/Trouter-Imagine-1}}, |
|
|
} |
|
|
``` |
|
|
|
|
|
## License |
|
|
|
|
|
This model is released under the **Apache License 2.0**. |
|
|
|
|
|
You are free to: |
|
|
- Use commercially |
|
|
- Modify and distribute |
|
|
- Use privately |
|
|
- Use in patent grants |
|
|
|
|
|
Conditions: |
|
|
- Include license and copyright notice |
|
|
- State changes made to the code |
|
|
- Include NOTICE file if provided |
|
|
|
|
|
See the [LICENSE](LICENSE) file for full details. |
|
|
|
|
|
## Model Card Contact |
|
|
|
|
|
For questions, issues, or collaboration opportunities: |
|
|
- **Repository**: https://huggingface.co/OpenTrouter/Trouter-Imagine-1 |
|
|
- **Issues**: Use the Community tab for support |
|
|
- **Updates**: Watch this repository for model updates |
|
|
|
|
|
## Acknowledgments |
|
|
|
|
|
Built on the foundation of open-source diffusion research and the Hugging Face ecosystem. Thanks to the AI research community for advancing generative models. |
|
|
|
|
|
--- |
|
|
|
|
|
**Version**: 1.0 |
|
|
**Last Updated**: November 2025 |
|
|
**Status**: Production Ready |