SeFi-Image Non-Commercial License Agreement
This repository is publicly accessible, but you have to accept the conditions to access its files and content.
By clicking "Agree and access repository", you acknowledge that you have read and agree to the Creative Commons Attribution-NonCommercial 4.0 International license (CC BY-NC 4.0). You agree to use SeFi-Image checkpoints for non-commercial purposes only and to comply with all applicable laws and responsible AI use requirements.
Log in or Sign Up to review the conditions and access this model content.
SeFi-Image
SeFi-Image is a text-to-image foundation model family built with Semantic-First Diffusion. It separates generation into semantic and texture latent streams, denoising semantic structure slightly ahead of texture details. This design gives the texture stream a cleaner structural anchor and improves the reconstruction-generation trade-off in latent diffusion.
![]() |
![]() |
Highlights
|
Semantic-first generation Semantic latents denoise ahead of texture latents, providing a cleaner structural anchor for image synthesis. |
Faster training The 5B model reaches strong benchmark performance with about 125K A800 GPU hours. |
Better generation-reconstruction trade-off A high-fidelity texture latent preserves reconstruction detail, while a compact semantic latent simplifies generation. |
Performance
The following numbers follow the main evaluation tables in the SeFi-Image technical report and summarize SeFi-Image-5B across representative benchmarks.
Model Zoo
| Family | Model | Checkpoint | Steps | Guidance |
|---|---|---|---|---|
| Base | SeFi-Image-1B-Base | SeFi-Image/SeFi-Image-1B-Base | 50 | 4.0 |
| Base | SeFi-Image-2B-Base | SeFi-Image/SeFi-Image-2B-Base | 50 | 4.0 |
| Base | SeFi-Image-5B-Base | SeFi-Image/SeFi-Image-5B-Base | 50 | 4.0 |
| RL | SeFi-Image-5B-RL | SeFi-Image/SeFi-Image-5B-RL | 50 | 4.0 |
| Turbo | SeFi-Image-1B-turbo | SeFi-Image/SeFi-Image-1B-turbo | 4 | 1.0 |
| Turbo | SeFi-Image-2B-turbo | SeFi-Image/SeFi-Image-2B-turbo | 4 | 1.0 |
| Turbo | SeFi-Image-5B-turbo | SeFi-Image/SeFi-Image-5B-turbo | 4 | 1.0 |
Quick Start
Install the SeFi inference code and dependencies from the SeFi-Image inference repository, then pass a Hugging Face checkpoint repo id:
python inference.py \
--checkpoint SeFi-Image/SeFi-Image-5B-Base \
--prompt "A red apple on a wooden table." \
--output-dir outputs/inference/sefi_5b_base
Turbo checkpoints use the same command pattern:
python inference.py \
--checkpoint SeFi-Image/SeFi-Image-5B-turbo \
--prompt "A blue ceramic mug on a white desk." \
--steps 4 \
--output-dir outputs/inference/sefi_5b_turbo
Python API:
from sefi import SEFIInferencePipeline
pipe = SEFIInferencePipeline.from_pretrained(
"SeFi-Image/SeFi-Image-5B-Base",
)
images = pipe(
"A red apple on a wooden table.",
seed=42,
)
images[0].save("example.png")
Turbo checkpoints use the same API:
from sefi import SEFIInferencePipeline
pipe = SEFIInferencePipeline.from_pretrained(
"SeFi-Image/SeFi-Image-5B-turbo",
)
images = pipe(
"A blue ceramic mug on a white desk.",
num_inference_steps=4,
guidance_scale=1.0,
seed=123,
)
images[0].save("turbo_example.png")
Intended Use
SeFi-Image is intended for research and creative text-to-image generation, including prompt following, bilingual text rendering, style exploration, and model development. The Base checkpoints are suitable starting points for fine-tuning and analysis. Turbo checkpoints are intended for fast generation. The RL checkpoint is intended for stronger alignment-oriented generation.
Citation
If you find SeFi-Image useful, please cite the paper:
@misc{sefiteam2026sefiimagetexttoimagefoundationmodel,
title={SeFi-Image: A Text-to-Image Foundation Model with Semantic-First Diffusion},
author={SeFi-Team},
year={2026},
eprint={2606.22568},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2606.22568},
}


