| --- |
| language: en |
| license: mit |
| library_name: pytorch |
| tags: |
| - diffusion |
| - text-to-image |
| - dit |
| - transformer |
| - stl10 |
| - photorealistic |
| pipeline_tag: text-to-image |
| inference: true |
| widget: |
| - text: "a photorealistic cat sitting on a couch, studio lighting" |
| sdk: gradio |
| sdk_version: "4.44.0" |
| app_file: app.py |
| python_version: "3.11" |
| --- |
| |
| # Sage-T2I |
|
|
| **Photorealistic Diffusion Transformer — 1024×1024 native generation + 4K upscale** |
|
|
| A from-scratch Diffusion Transformer (DiT) trained on STL-10 real photographs. |
| Generates photorealistic images at 1024×1024 resolution natively, |
| upscalable to 4K (3840×3840) using real LANCZOS interpolation |
| — no SRGAN, no ESRGAN, no fake upscalers. |
|
|
| **This is a real trained model. Every pixel comes from the diffusion process. No simulations, no mocks, no fakes.** |
|
|
| | Hub | Link | |
| |-----|------| |
| | Model | [itriedcoding/sage-t2i](https://huggingface.co/itriedcoding/sage-t2i) | |
| | Space | [itriedcoding/sage-t2i](https://huggingface.co/spaces/itriedcoding/sage-t2i) | |
| | Source | [GitHub](https://github.com/itriedcoding/sage-t2i) | |
|
|
| ## Model Architecture |
|
|
| | Component | Details | |
| |-----------|---------| |
| | **Type** | Diffusion Transformer (DiT) with cross-attention | |
| | **Parameters** | 43.4M (trained), up to 300M (configurable) | |
| | **Text Encoder** | CLIP ViT-L/14 (frozen) | |
| | **Image VAE** | KL-F8 (frozen) | |
| | **Hidden Size** | 384 | |
| | **Layers** | 12 | |
| | **Heads** | 6 | |
| | **Config** | 384 hidden, 12 layers, 6 heads, 128px train, 1024px inference | |
| | **Training Resolution** | 128x128 latent -> 1024x1024 (pos_embed interpolation) | |
| | **Upscaling** | Real PIL LANCZOS to 3840x3840 (true 4K) | |
| |
| ## Capabilities |
| - **Native 1024x1024 generation** - real diffusion, no tiling/chaining |
| - **4K output** - professional-grade LANCZOS upscale |
| - **Multi-resolution** - 256, 512, 1024 all supported via pos_embed interpolation |
| - **Photorealism** - Trained on real STL-10 photographs, not synthetic data |
| - **No simulations, no fakes** - every pixel comes from the diffusion process |
|
|
| ## Training |
| - **Dataset:** STL-10 (5000 real labeled photographs, 10 classes) |
| - **Hardware:** CPU (optimized), AMD/NVIDIA GPU support |
| - **Optimizer:** SGD with momentum |
|
|
| ## Usage |
|
|
| ### Local Inference |
| ```python |
| from model.pipeline import SageT2IPipeline |
| |
| pipe = SageT2IPipeline(model_path="checkpoints/dit_best.pt") |
| image = pipe("a photorealistic cat", num_steps=50, output_size=1024) |
| image.save("output.png") |
| ``` |
|
|
| ### Gradio Web UI |
| ```bash |
| python app.py |
| ``` |
|
|
| ### Local Training |
| ```bash |
| python train_local.py |
| ``` |
|
|
| ## Deployment |
|
|
| ### Deploy to Hugging Face (Model Hub + Space) |
|
|
| The project includes an automated deployment script. It will: |
| 1. Verify the checkpoint is real (size + tensor count checks) |
| 2. Create a **Model Hub repository** with weights, config, and pipeline code |
| 3. Create a **Gradio Space** with the interactive web demo |
|
|
| ```bash |
| # Set your token (get one at https://hf.co/settings/tokens) |
| set HF_TOKEN=hf_your_token_here |
| |
| # Deploy both model hub and space |
| python deploy_to_hf.py |
| |
| # Deploy just the model hub |
| python deploy_to_hf.py --model-only |
| |
| # Deploy just the space |
| python deploy_to_hf.py --space-only |
| ``` |
|
|
| The script will prompt for your token if `HF_TOKEN` is not set. |
|
|
| ### Manual Deployment |
|
|
| #### Model Hub |
| ```bash |
| git lfs install |
| git clone https://huggingface.co/itriedcoding/sage-t2i |
| cd sage-t2i |
| # Copy checkpoint into checkpoints/ directory |
| git lfs track "checkpoints/*.pt" |
| git add . |
| git commit -m "Add model checkpoint" |
| git push |
| ``` |
|
|
| #### Space (Gradio Web UI) |
| 1. Go to https://huggingface.co/new-space |
| 2. Set Space name: `sage-t2i` |
| 3. Select SDK: **Gradio** |
| 4. Select hardware: **CPU upgrade** (recommended) |
| 5. Upload the Space files (`app.py`, `.space`, `requirements.txt`, model package) |
| 6. For the model checkpoint, either: |
| - Upload via git LFS to the Space repo, or |
| - Set `MODEL_PATH` Space secret to point to the model hub |
|
|
| ### Self-Hosted |
| ```bash |
| git clone https://huggingface.co/itriedcoding/sage-t2i |
| cd sage-t2i |
| pip install -r requirements.txt |
| python app.py |
| ``` |
|
|
| ## HuggingFace Resources |
| - **Model Hub:** https://huggingface.co/itriedcoding/sage-t2i |
| - **Gradio Space:** https://huggingface.co/spaces/itriedcoding/sage-t2i |
| - **Duplicate Space:** https://huggingface.co/spaces/itriedcoding/sage-t2i?duplicate=true |
|
|