File size: 4,264 Bytes
2d7087a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e0e4fb1
 
 
 
2d7087a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
---
language: en
license: mit
library_name: pytorch
tags:
- diffusion
- text-to-image
- dit
- transformer
- stl10
- photorealistic
pipeline_tag: text-to-image
inference: true
widget:
- text: "a photorealistic cat sitting on a couch, studio lighting"
sdk: gradio
sdk_version: "4.44.0"
app_file: app.py
python_version: "3.11"
---

# Sage-T2I

**Photorealistic Diffusion Transformer — 1024×1024 native generation + 4K upscale**

A from-scratch Diffusion Transformer (DiT) trained on STL-10 real photographs.
Generates photorealistic images at 1024×1024 resolution natively,
upscalable to 4K (3840×3840) using real LANCZOS interpolation
— no SRGAN, no ESRGAN, no fake upscalers.

**This is a real trained model. Every pixel comes from the diffusion process. No simulations, no mocks, no fakes.**

| Hub | Link |
|-----|------|
| Model | [itriedcoding/sage-t2i](https://huggingface.co/itriedcoding/sage-t2i) |
| Space | [itriedcoding/sage-t2i](https://huggingface.co/spaces/itriedcoding/sage-t2i) |
| Source | [GitHub](https://github.com/itriedcoding/sage-t2i) |

## Model Architecture

| Component | Details |
|-----------|---------|
| **Type** | Diffusion Transformer (DiT) with cross-attention |
| **Parameters** | 43.4M (trained), up to 300M (configurable) |
| **Text Encoder** | CLIP ViT-L/14 (frozen) |
| **Image VAE** | KL-F8 (frozen) |
| **Hidden Size** | 384 |
| **Layers** | 12 |
| **Heads** | 6 |
| **Config** | 384 hidden, 12 layers, 6 heads, 128px train, 1024px inference |
| **Training Resolution** | 128x128 latent -> 1024x1024 (pos_embed interpolation) |
| **Upscaling** | Real PIL LANCZOS to 3840x3840 (true 4K) |

## Capabilities
- **Native 1024x1024 generation** - real diffusion, no tiling/chaining
- **4K output** - professional-grade LANCZOS upscale
- **Multi-resolution** - 256, 512, 1024 all supported via pos_embed interpolation
- **Photorealism** - Trained on real STL-10 photographs, not synthetic data
- **No simulations, no fakes** - every pixel comes from the diffusion process

## Training
- **Dataset:** STL-10 (5000 real labeled photographs, 10 classes)
- **Hardware:** CPU (optimized), AMD/NVIDIA GPU support
- **Optimizer:** SGD with momentum

## Usage

### Local Inference
```python
from model.pipeline import SageT2IPipeline

pipe = SageT2IPipeline(model_path="checkpoints/dit_best.pt")
image = pipe("a photorealistic cat", num_steps=50, output_size=1024)
image.save("output.png")
```

### Gradio Web UI
```bash
python app.py
```

### Local Training
```bash
python train_local.py
```

## Deployment

### Deploy to Hugging Face (Model Hub + Space)

The project includes an automated deployment script. It will:
1. Verify the checkpoint is real (size + tensor count checks)
2. Create a **Model Hub repository** with weights, config, and pipeline code
3. Create a **Gradio Space** with the interactive web demo

```bash
# Set your token (get one at https://hf.co/settings/tokens)
set HF_TOKEN=hf_your_token_here

# Deploy both model hub and space
python deploy_to_hf.py

# Deploy just the model hub
python deploy_to_hf.py --model-only

# Deploy just the space
python deploy_to_hf.py --space-only
```

The script will prompt for your token if `HF_TOKEN` is not set.

### Manual Deployment

#### Model Hub
```bash
git lfs install
git clone https://huggingface.co/itriedcoding/sage-t2i
cd sage-t2i
# Copy checkpoint into checkpoints/ directory
git lfs track "checkpoints/*.pt"
git add .
git commit -m "Add model checkpoint"
git push
```

#### Space (Gradio Web UI)
1. Go to https://huggingface.co/new-space
2. Set Space name: `sage-t2i`
3. Select SDK: **Gradio**
4. Select hardware: **CPU upgrade** (recommended)
5. Upload the Space files (`app.py`, `.space`, `requirements.txt`, model package)
6. For the model checkpoint, either:
   - Upload via git LFS to the Space repo, or
   - Set `MODEL_PATH` Space secret to point to the model hub

### Self-Hosted
```bash
git clone https://huggingface.co/itriedcoding/sage-t2i
cd sage-t2i
pip install -r requirements.txt
python app.py
```

## HuggingFace Resources
- **Model Hub:** https://huggingface.co/itriedcoding/sage-t2i
- **Gradio Space:** https://huggingface.co/spaces/itriedcoding/sage-t2i
- **Duplicate Space:** https://huggingface.co/spaces/itriedcoding/sage-t2i?duplicate=true