File size: 4,264 Bytes
2d7087a e0e4fb1 2d7087a | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 | ---
language: en
license: mit
library_name: pytorch
tags:
- diffusion
- text-to-image
- dit
- transformer
- stl10
- photorealistic
pipeline_tag: text-to-image
inference: true
widget:
- text: "a photorealistic cat sitting on a couch, studio lighting"
sdk: gradio
sdk_version: "4.44.0"
app_file: app.py
python_version: "3.11"
---
# Sage-T2I
**Photorealistic Diffusion Transformer — 1024×1024 native generation + 4K upscale**
A from-scratch Diffusion Transformer (DiT) trained on STL-10 real photographs.
Generates photorealistic images at 1024×1024 resolution natively,
upscalable to 4K (3840×3840) using real LANCZOS interpolation
— no SRGAN, no ESRGAN, no fake upscalers.
**This is a real trained model. Every pixel comes from the diffusion process. No simulations, no mocks, no fakes.**
| Hub | Link |
|-----|------|
| Model | [itriedcoding/sage-t2i](https://huggingface.co/itriedcoding/sage-t2i) |
| Space | [itriedcoding/sage-t2i](https://huggingface.co/spaces/itriedcoding/sage-t2i) |
| Source | [GitHub](https://github.com/itriedcoding/sage-t2i) |
## Model Architecture
| Component | Details |
|-----------|---------|
| **Type** | Diffusion Transformer (DiT) with cross-attention |
| **Parameters** | 43.4M (trained), up to 300M (configurable) |
| **Text Encoder** | CLIP ViT-L/14 (frozen) |
| **Image VAE** | KL-F8 (frozen) |
| **Hidden Size** | 384 |
| **Layers** | 12 |
| **Heads** | 6 |
| **Config** | 384 hidden, 12 layers, 6 heads, 128px train, 1024px inference |
| **Training Resolution** | 128x128 latent -> 1024x1024 (pos_embed interpolation) |
| **Upscaling** | Real PIL LANCZOS to 3840x3840 (true 4K) |
## Capabilities
- **Native 1024x1024 generation** - real diffusion, no tiling/chaining
- **4K output** - professional-grade LANCZOS upscale
- **Multi-resolution** - 256, 512, 1024 all supported via pos_embed interpolation
- **Photorealism** - Trained on real STL-10 photographs, not synthetic data
- **No simulations, no fakes** - every pixel comes from the diffusion process
## Training
- **Dataset:** STL-10 (5000 real labeled photographs, 10 classes)
- **Hardware:** CPU (optimized), AMD/NVIDIA GPU support
- **Optimizer:** SGD with momentum
## Usage
### Local Inference
```python
from model.pipeline import SageT2IPipeline
pipe = SageT2IPipeline(model_path="checkpoints/dit_best.pt")
image = pipe("a photorealistic cat", num_steps=50, output_size=1024)
image.save("output.png")
```
### Gradio Web UI
```bash
python app.py
```
### Local Training
```bash
python train_local.py
```
## Deployment
### Deploy to Hugging Face (Model Hub + Space)
The project includes an automated deployment script. It will:
1. Verify the checkpoint is real (size + tensor count checks)
2. Create a **Model Hub repository** with weights, config, and pipeline code
3. Create a **Gradio Space** with the interactive web demo
```bash
# Set your token (get one at https://hf.co/settings/tokens)
set HF_TOKEN=hf_your_token_here
# Deploy both model hub and space
python deploy_to_hf.py
# Deploy just the model hub
python deploy_to_hf.py --model-only
# Deploy just the space
python deploy_to_hf.py --space-only
```
The script will prompt for your token if `HF_TOKEN` is not set.
### Manual Deployment
#### Model Hub
```bash
git lfs install
git clone https://huggingface.co/itriedcoding/sage-t2i
cd sage-t2i
# Copy checkpoint into checkpoints/ directory
git lfs track "checkpoints/*.pt"
git add .
git commit -m "Add model checkpoint"
git push
```
#### Space (Gradio Web UI)
1. Go to https://huggingface.co/new-space
2. Set Space name: `sage-t2i`
3. Select SDK: **Gradio**
4. Select hardware: **CPU upgrade** (recommended)
5. Upload the Space files (`app.py`, `.space`, `requirements.txt`, model package)
6. For the model checkpoint, either:
- Upload via git LFS to the Space repo, or
- Set `MODEL_PATH` Space secret to point to the model hub
### Self-Hosted
```bash
git clone https://huggingface.co/itriedcoding/sage-t2i
cd sage-t2i
pip install -r requirements.txt
python app.py
```
## HuggingFace Resources
- **Model Hub:** https://huggingface.co/itriedcoding/sage-t2i
- **Gradio Space:** https://huggingface.co/spaces/itriedcoding/sage-t2i
- **Duplicate Space:** https://huggingface.co/spaces/itriedcoding/sage-t2i?duplicate=true
|