--- language: en license: mit library_name: pytorch tags: - diffusion - text-to-image - dit - transformer - stl10 - photorealistic pipeline_tag: text-to-image inference: true widget: - text: "a photorealistic cat sitting on a couch, studio lighting" sdk: gradio sdk_version: "4.44.0" app_file: app.py python_version: "3.11" --- # Sage-T2I **Photorealistic Diffusion Transformer — 1024×1024 native generation + 4K upscale** A from-scratch Diffusion Transformer (DiT) trained on STL-10 real photographs. Generates photorealistic images at 1024×1024 resolution natively, upscalable to 4K (3840×3840) using real LANCZOS interpolation — no SRGAN, no ESRGAN, no fake upscalers. **This is a real trained model. Every pixel comes from the diffusion process. No simulations, no mocks, no fakes.** | Hub | Link | |-----|------| | Model | [itriedcoding/sage-t2i](https://huggingface.co/itriedcoding/sage-t2i) | | Space | [itriedcoding/sage-t2i](https://huggingface.co/spaces/itriedcoding/sage-t2i) | | Source | [GitHub](https://github.com/itriedcoding/sage-t2i) | ## Model Architecture | Component | Details | |-----------|---------| | **Type** | Diffusion Transformer (DiT) with cross-attention | | **Parameters** | 43.4M (trained), up to 300M (configurable) | | **Text Encoder** | CLIP ViT-L/14 (frozen) | | **Image VAE** | KL-F8 (frozen) | | **Hidden Size** | 384 | | **Layers** | 12 | | **Heads** | 6 | | **Config** | 384 hidden, 12 layers, 6 heads, 128px train, 1024px inference | | **Training Resolution** | 128x128 latent -> 1024x1024 (pos_embed interpolation) | | **Upscaling** | Real PIL LANCZOS to 3840x3840 (true 4K) | ## Capabilities - **Native 1024x1024 generation** - real diffusion, no tiling/chaining - **4K output** - professional-grade LANCZOS upscale - **Multi-resolution** - 256, 512, 1024 all supported via pos_embed interpolation - **Photorealism** - Trained on real STL-10 photographs, not synthetic data - **No simulations, no fakes** - every pixel comes from the diffusion process ## Training - **Dataset:** STL-10 (5000 real labeled photographs, 10 classes) - **Hardware:** CPU (optimized), AMD/NVIDIA GPU support - **Optimizer:** SGD with momentum ## Usage ### Local Inference ```python from model.pipeline import SageT2IPipeline pipe = SageT2IPipeline(model_path="checkpoints/dit_best.pt") image = pipe("a photorealistic cat", num_steps=50, output_size=1024) image.save("output.png") ``` ### Gradio Web UI ```bash python app.py ``` ### Local Training ```bash python train_local.py ``` ## Deployment ### Deploy to Hugging Face (Model Hub + Space) The project includes an automated deployment script. It will: 1. Verify the checkpoint is real (size + tensor count checks) 2. Create a **Model Hub repository** with weights, config, and pipeline code 3. Create a **Gradio Space** with the interactive web demo ```bash # Set your token (get one at https://hf.co/settings/tokens) set HF_TOKEN=hf_your_token_here # Deploy both model hub and space python deploy_to_hf.py # Deploy just the model hub python deploy_to_hf.py --model-only # Deploy just the space python deploy_to_hf.py --space-only ``` The script will prompt for your token if `HF_TOKEN` is not set. ### Manual Deployment #### Model Hub ```bash git lfs install git clone https://huggingface.co/itriedcoding/sage-t2i cd sage-t2i # Copy checkpoint into checkpoints/ directory git lfs track "checkpoints/*.pt" git add . git commit -m "Add model checkpoint" git push ``` #### Space (Gradio Web UI) 1. Go to https://huggingface.co/new-space 2. Set Space name: `sage-t2i` 3. Select SDK: **Gradio** 4. Select hardware: **CPU upgrade** (recommended) 5. Upload the Space files (`app.py`, `.space`, `requirements.txt`, model package) 6. For the model checkpoint, either: - Upload via git LFS to the Space repo, or - Set `MODEL_PATH` Space secret to point to the model hub ### Self-Hosted ```bash git clone https://huggingface.co/itriedcoding/sage-t2i cd sage-t2i pip install -r requirements.txt python app.py ``` ## HuggingFace Resources - **Model Hub:** https://huggingface.co/itriedcoding/sage-t2i - **Gradio Space:** https://huggingface.co/spaces/itriedcoding/sage-t2i - **Duplicate Space:** https://huggingface.co/spaces/itriedcoding/sage-t2i?duplicate=true