sage-t2i / README.md

Upload folder using huggingface_hub

e0e4fb1 verified 26 days ago

4.26 kB

	---
	language: en
	license: mit
	library_name: pytorch
	tags:
	- diffusion
	- text-to-image
	- dit
	- transformer
	- stl10
	- photorealistic
	pipeline_tag: text-to-image
	inference: true
	widget:
	- text: "a photorealistic cat sitting on a couch, studio lighting"
	sdk: gradio
	sdk_version: "4.44.0"
	app_file: app.py
	python_version: "3.11"
	---

	# Sage-T2I

	Photorealistic Diffusion Transformer — 1024×1024 native generation + 4K upscale

	A from-scratch Diffusion Transformer (DiT) trained on STL-10 real photographs.
	Generates photorealistic images at 1024×1024 resolution natively,
	upscalable to 4K (3840×3840) using real LANCZOS interpolation
	— no SRGAN, no ESRGAN, no fake upscalers.

	This is a real trained model. Every pixel comes from the diffusion process. No simulations, no mocks, no fakes.

	\| Hub \| Link \|
	\|-----\|------\|
	\| Model \| [itriedcoding/sage-t2i](https://huggingface.co/itriedcoding/sage-t2i) \|
	\| Space \| [itriedcoding/sage-t2i](https://huggingface.co/spaces/itriedcoding/sage-t2i) \|
	\| Source \| [GitHub](https://github.com/itriedcoding/sage-t2i) \|

	## Model Architecture

	\| Component \| Details \|
	\|-----------\|---------\|
	\| Type \| Diffusion Transformer (DiT) with cross-attention \|
	\| Parameters \| 43.4M (trained), up to 300M (configurable) \|
	\| Text Encoder \| CLIP ViT-L/14 (frozen) \|
	\| Image VAE \| KL-F8 (frozen) \|
	\| Hidden Size \| 384 \|
	\| Layers \| 12 \|
	\| Heads \| 6 \|
	\| Config \| 384 hidden, 12 layers, 6 heads, 128px train, 1024px inference \|
	\| Training Resolution \| 128x128 latent -> 1024x1024 (pos_embed interpolation) \|
	\| Upscaling \| Real PIL LANCZOS to 3840x3840 (true 4K) \|

	## Capabilities
	- Native 1024x1024 generation - real diffusion, no tiling/chaining
	- 4K output - professional-grade LANCZOS upscale
	- Multi-resolution - 256, 512, 1024 all supported via pos_embed interpolation
	- Photorealism - Trained on real STL-10 photographs, not synthetic data
	- No simulations, no fakes - every pixel comes from the diffusion process

	## Training
	- Dataset: STL-10 (5000 real labeled photographs, 10 classes)
	- Hardware: CPU (optimized), AMD/NVIDIA GPU support
	- Optimizer: SGD with momentum

	## Usage

	### Local Inference
	```python
	from model.pipeline import SageT2IPipeline

	pipe = SageT2IPipeline(model_path="checkpoints/dit_best.pt")
	image = pipe("a photorealistic cat", num_steps=50, output_size=1024)
	image.save("output.png")
	```

	### Gradio Web UI
	```bash
	python app.py
	```

	### Local Training
	```bash
	python train_local.py
	```

	## Deployment

	### Deploy to Hugging Face (Model Hub + Space)

	The project includes an automated deployment script. It will:
	1. Verify the checkpoint is real (size + tensor count checks)
	2. Create a Model Hub repository with weights, config, and pipeline code
	3. Create a Gradio Space with the interactive web demo

	```bash
	# Set your token (get one at https://hf.co/settings/tokens)
	set HF_TOKEN=hf_your_token_here

	# Deploy both model hub and space
	python deploy_to_hf.py

	# Deploy just the model hub
	python deploy_to_hf.py --model-only

	# Deploy just the space
	python deploy_to_hf.py --space-only
	```

	The script will prompt for your token if `HF_TOKEN` is not set.

	### Manual Deployment

	#### Model Hub
	```bash
	git lfs install
	git clone https://huggingface.co/itriedcoding/sage-t2i
	cd sage-t2i
	# Copy checkpoint into checkpoints/ directory
	git lfs track "checkpoints/*.pt"
	git add .
	git commit -m "Add model checkpoint"
	git push
	```

	#### Space (Gradio Web UI)
	1. Go to https://huggingface.co/new-space
	2. Set Space name: `sage-t2i`
	3. Select SDK: Gradio
	4. Select hardware: CPU upgrade (recommended)
	5. Upload the Space files (`app.py`, `.space`, `requirements.txt`, model package)
	6. For the model checkpoint, either:
	- Upload via git LFS to the Space repo, or
	- Set `MODEL_PATH` Space secret to point to the model hub

	### Self-Hosted
	```bash
	git clone https://huggingface.co/itriedcoding/sage-t2i
	cd sage-t2i
	pip install -r requirements.txt
	python app.py
	```

	## HuggingFace Resources
	- Model Hub: https://huggingface.co/itriedcoding/sage-t2i
	- Gradio Space: https://huggingface.co/spaces/itriedcoding/sage-t2i
	- Duplicate Space: https://huggingface.co/spaces/itriedcoding/sage-t2i?duplicate=true