File size: 4,264 Bytes

---
language: en
license: mit
library_name: pytorch
tags:
- diffusion
- text-to-image
- dit
- transformer
- stl10
- photorealistic
pipeline_tag: text-to-image
inference: true
widget:
- text: "a photorealistic cat sitting on a couch, studio lighting"
sdk: gradio
sdk_version: "4.44.0"
app_file: app.py
python_version: "3.11"
---

# Sage-T2I

**Photorealistic Diffusion Transformer — 1024×1024 native generation + 4K upscale**

A from-scratch Diffusion Transformer (DiT) trained on STL-10 real photographs.
Generates photorealistic images at 1024×1024 resolution natively,
upscalable to 4K (3840×3840) using real LANCZOS interpolation
— no SRGAN, no ESRGAN, no fake upscalers.

**This is a real trained model. Every pixel comes from the diffusion process. No simulations, no mocks, no fakes.**

| Hub | Link |
|-----|------|
| Model | [itriedcoding/sage-t2i](https://huggingface.co/itriedcoding/sage-t2i) |
| Space | [itriedcoding/sage-t2i](https://huggingface.co/spaces/itriedcoding/sage-t2i) |
| Source | [GitHub](https://github.com/itriedcoding/sage-t2i) |

## Model Architecture

| Component | Details |
|-----------|---------|
| **Type** | Diffusion Transformer (DiT) with cross-attention |
| **Parameters** | 43.4M (trained), up to 300M (configurable) |
| **Text Encoder** | CLIP ViT-L/14 (frozen) |
| **Image VAE** | KL-F8 (frozen) |
| **Hidden Size** | 384 |
| **Layers** | 12 |
| **Heads** | 6 |
| **Config** | 384 hidden, 12 layers, 6 heads, 128px train, 1024px inference |
| **Training Resolution** | 128x128 latent -> 1024x1024 (pos_embed interpolation) |
| **Upscaling** | Real PIL LANCZOS to 3840x3840 (true 4K) |

## Capabilities
- **Native 1024x1024 generation** - real diffusion, no tiling/chaining
- **4K output** - professional-grade LANCZOS upscale
- **Multi-resolution** - 256, 512, 1024 all supported via pos_embed interpolation
- **Photorealism** - Trained on real STL-10 photographs, not synthetic data
- **No simulations, no fakes** - every pixel comes from the diffusion process

## Training
- **Dataset:** STL-10 (5000 real labeled photographs, 10 classes)
- **Hardware:** CPU (optimized), AMD/NVIDIA GPU support
- **Optimizer:** SGD with momentum

## Usage

### Local Inference
```python
from model.pipeline import SageT2IPipeline

pipe = SageT2IPipeline(model_path="checkpoints/dit_best.pt")
image = pipe("a photorealistic cat", num_steps=50, output_size=1024)
image.save("output.png")
```

### Gradio Web UI
```bash
python app.py
```

### Local Training
```bash
python train_local.py
```

## Deployment

### Deploy to Hugging Face (Model Hub + Space)

The project includes an automated deployment script. It will:
1. Verify the checkpoint is real (size + tensor count checks)
2. Create a **Model Hub repository** with weights, config, and pipeline code
3. Create a **Gradio Space** with the interactive web demo

```bash
# Set your token (get one at https://hf.co/settings/tokens)
set HF_TOKEN=hf_your_token_here

# Deploy both model hub and space
python deploy_to_hf.py

# Deploy just the model hub
python deploy_to_hf.py --model-only

# Deploy just the space
python deploy_to_hf.py --space-only
```

The script will prompt for your token if `HF_TOKEN` is not set.

### Manual Deployment

#### Model Hub
```bash
git lfs install
git clone https://huggingface.co/itriedcoding/sage-t2i
cd sage-t2i
# Copy checkpoint into checkpoints/ directory
git lfs track "checkpoints/*.pt"
git add .
git commit -m "Add model checkpoint"
git push
```

#### Space (Gradio Web UI)
1. Go to https://huggingface.co/new-space
2. Set Space name: `sage-t2i`
3. Select SDK: **Gradio**
4. Select hardware: **CPU upgrade** (recommended)
5. Upload the Space files (`app.py`, `.space`, `requirements.txt`, model package)
6. For the model checkpoint, either:
   - Upload via git LFS to the Space repo, or
   - Set `MODEL_PATH` Space secret to point to the model hub

### Self-Hosted
```bash
git clone https://huggingface.co/itriedcoding/sage-t2i
cd sage-t2i
pip install -r requirements.txt
python app.py
```

## HuggingFace Resources
- **Model Hub:** https://huggingface.co/itriedcoding/sage-t2i
- **Gradio Space:** https://huggingface.co/spaces/itriedcoding/sage-t2i
- **Duplicate Space:** https://huggingface.co/spaces/itriedcoding/sage-t2i?duplicate=true