sage-t2i / README.md
itriedcoding's picture
Upload folder using huggingface_hub
e0e4fb1 verified
|
Raw
History Blame Contribute Delete
4.26 kB
---
language: en
license: mit
library_name: pytorch
tags:
- diffusion
- text-to-image
- dit
- transformer
- stl10
- photorealistic
pipeline_tag: text-to-image
inference: true
widget:
- text: "a photorealistic cat sitting on a couch, studio lighting"
sdk: gradio
sdk_version: "4.44.0"
app_file: app.py
python_version: "3.11"
---
# Sage-T2I
**Photorealistic Diffusion Transformer — 1024×1024 native generation + 4K upscale**
A from-scratch Diffusion Transformer (DiT) trained on STL-10 real photographs.
Generates photorealistic images at 1024×1024 resolution natively,
upscalable to 4K (3840×3840) using real LANCZOS interpolation
— no SRGAN, no ESRGAN, no fake upscalers.
**This is a real trained model. Every pixel comes from the diffusion process. No simulations, no mocks, no fakes.**
| Hub | Link |
|-----|------|
| Model | [itriedcoding/sage-t2i](https://huggingface.co/itriedcoding/sage-t2i) |
| Space | [itriedcoding/sage-t2i](https://huggingface.co/spaces/itriedcoding/sage-t2i) |
| Source | [GitHub](https://github.com/itriedcoding/sage-t2i) |
## Model Architecture
| Component | Details |
|-----------|---------|
| **Type** | Diffusion Transformer (DiT) with cross-attention |
| **Parameters** | 43.4M (trained), up to 300M (configurable) |
| **Text Encoder** | CLIP ViT-L/14 (frozen) |
| **Image VAE** | KL-F8 (frozen) |
| **Hidden Size** | 384 |
| **Layers** | 12 |
| **Heads** | 6 |
| **Config** | 384 hidden, 12 layers, 6 heads, 128px train, 1024px inference |
| **Training Resolution** | 128x128 latent -> 1024x1024 (pos_embed interpolation) |
| **Upscaling** | Real PIL LANCZOS to 3840x3840 (true 4K) |
## Capabilities
- **Native 1024x1024 generation** - real diffusion, no tiling/chaining
- **4K output** - professional-grade LANCZOS upscale
- **Multi-resolution** - 256, 512, 1024 all supported via pos_embed interpolation
- **Photorealism** - Trained on real STL-10 photographs, not synthetic data
- **No simulations, no fakes** - every pixel comes from the diffusion process
## Training
- **Dataset:** STL-10 (5000 real labeled photographs, 10 classes)
- **Hardware:** CPU (optimized), AMD/NVIDIA GPU support
- **Optimizer:** SGD with momentum
## Usage
### Local Inference
```python
from model.pipeline import SageT2IPipeline
pipe = SageT2IPipeline(model_path="checkpoints/dit_best.pt")
image = pipe("a photorealistic cat", num_steps=50, output_size=1024)
image.save("output.png")
```
### Gradio Web UI
```bash
python app.py
```
### Local Training
```bash
python train_local.py
```
## Deployment
### Deploy to Hugging Face (Model Hub + Space)
The project includes an automated deployment script. It will:
1. Verify the checkpoint is real (size + tensor count checks)
2. Create a **Model Hub repository** with weights, config, and pipeline code
3. Create a **Gradio Space** with the interactive web demo
```bash
# Set your token (get one at https://hf.co/settings/tokens)
set HF_TOKEN=hf_your_token_here
# Deploy both model hub and space
python deploy_to_hf.py
# Deploy just the model hub
python deploy_to_hf.py --model-only
# Deploy just the space
python deploy_to_hf.py --space-only
```
The script will prompt for your token if `HF_TOKEN` is not set.
### Manual Deployment
#### Model Hub
```bash
git lfs install
git clone https://huggingface.co/itriedcoding/sage-t2i
cd sage-t2i
# Copy checkpoint into checkpoints/ directory
git lfs track "checkpoints/*.pt"
git add .
git commit -m "Add model checkpoint"
git push
```
#### Space (Gradio Web UI)
1. Go to https://huggingface.co/new-space
2. Set Space name: `sage-t2i`
3. Select SDK: **Gradio**
4. Select hardware: **CPU upgrade** (recommended)
5. Upload the Space files (`app.py`, `.space`, `requirements.txt`, model package)
6. For the model checkpoint, either:
- Upload via git LFS to the Space repo, or
- Set `MODEL_PATH` Space secret to point to the model hub
### Self-Hosted
```bash
git clone https://huggingface.co/itriedcoding/sage-t2i
cd sage-t2i
pip install -r requirements.txt
python app.py
```
## HuggingFace Resources
- **Model Hub:** https://huggingface.co/itriedcoding/sage-t2i
- **Gradio Space:** https://huggingface.co/spaces/itriedcoding/sage-t2i
- **Duplicate Space:** https://huggingface.co/spaces/itriedcoding/sage-t2i?duplicate=true