SeFi-Image Non-Commercial License Agreement

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

By clicking "Agree and access repository", you acknowledge that you have read and agree to the Creative Commons Attribution-NonCommercial 4.0 International license (CC BY-NC 4.0). You agree to use SeFi-Image checkpoints for non-commercial purposes only and to comply with all applicable laws and responsible AI use requirements.

SeFi-Image

SeFi-Image is a text-to-image foundation model family built with Semantic-First Diffusion. It separates generation into semantic and texture latent streams, denoising semantic structure slightly ahead of texture details. This design gives the texture stream a cleaner structural anchor and improves the reconstruction-generation trade-off in latent diffusion.

Highlights

Semantic-first generation
_{Semantic latents denoise ahead of texture latents, providing a cleaner structural anchor for image synthesis.}

Faster training
_{The 5B model reaches strong benchmark performance with about 125K A800 GPU hours.}

Better generation-reconstruction trade-off
_{A high-fidelity texture latent preserves reconstruction detail, while a compact semantic latent simplifies generation.}

Performance

The following numbers follow the main evaluation tables in the SeFi-Image technical report and summarize SeFi-Image-5B across representative benchmarks.

Model Zoo

Family	Model	Checkpoint	Steps	Guidance
Base	SeFi-Image-1B-Base	SeFi-Image/SeFi-Image-1B-Base	50	4.0
Base	SeFi-Image-2B-Base	SeFi-Image/SeFi-Image-2B-Base	50	4.0
Base	SeFi-Image-5B-Base	SeFi-Image/SeFi-Image-5B-Base	50	4.0
RL	SeFi-Image-5B-RL	SeFi-Image/SeFi-Image-5B-RL	50	4.0
Turbo	SeFi-Image-1B-turbo	SeFi-Image/SeFi-Image-1B-turbo	4	1.0
Turbo	SeFi-Image-2B-turbo	SeFi-Image/SeFi-Image-2B-turbo	4	1.0
Turbo	SeFi-Image-5B-turbo	SeFi-Image/SeFi-Image-5B-turbo	4	1.0

Quick Start

Install the SeFi inference code and dependencies from the SeFi-Image inference repository, then pass a Hugging Face checkpoint repo id:

python inference.py \
  --checkpoint SeFi-Image/SeFi-Image-5B-Base \
  --prompt "A red apple on a wooden table." \
  --output-dir outputs/inference/sefi_5b_base

Turbo checkpoints use the same command pattern:

python inference.py \
  --checkpoint SeFi-Image/SeFi-Image-5B-turbo \
  --prompt "A blue ceramic mug on a white desk." \
  --steps 4 \
  --output-dir outputs/inference/sefi_5b_turbo

Python API:

from sefi import SEFIInferencePipeline

pipe = SEFIInferencePipeline.from_pretrained(
    "SeFi-Image/SeFi-Image-5B-Base",
)
images = pipe(
    "A red apple on a wooden table.",
    seed=42,
)
images[0].save("example.png")

Turbo checkpoints use the same API:

from sefi import SEFIInferencePipeline

pipe = SEFIInferencePipeline.from_pretrained(
    "SeFi-Image/SeFi-Image-5B-turbo",
)
images = pipe(
    "A blue ceramic mug on a white desk.",
    num_inference_steps=4,
    guidance_scale=1.0,
    seed=123,
)
images[0].save("turbo_example.png")

Intended Use

SeFi-Image is intended for research and creative text-to-image generation, including prompt following, bilingual text rendering, style exploration, and model development. The Base checkpoints are suitable starting points for fine-tuning and analysis. Turbo checkpoints are intended for fast generation. The RL checkpoint is intended for stronger alignment-oriented generation.

Citation

If you find SeFi-Image useful, please cite the paper:

@misc{sefiteam2026sefiimagetexttoimagefoundationmodel,
      title={SeFi-Image: A Text-to-Image Foundation Model with Semantic-First Diffusion}, 
      author={SeFi-Team},
      year={2026},
      eprint={2606.22568},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2606.22568}, 
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Paper for SeFi-Image/SeFi-Image-5B-Base

SeFi-Image: A Text-to-Image Foundation Model with Semantic-First Diffusion

Paper • 2606.22568 • Published 3 days ago