Configuration Parsing Warning:Config file tokenizer_config.json cannot be fetched (too big)

🌌 Illuma - Truly Open Source Image Generation

Illuma is an image generation model cloned from Salesforce/BLIP3o-NEXT-GRPO-TexT-3B β€” the first truly open-source image generation model with:

  • βœ… Training code released (Apache 2.0)
  • βœ… Datasets released (BLIP3o-Pretrain + BLIP3o-60K)
  • βœ… No usage restrictions (Apache 2.0 license)
  • βœ… Can be renamed, rebranded, and refined

Architecture

AR (3B Qwen2.5-VL) + SANA 1.5 Diffusion Decoder

Illuma uses a two-stage generation process:

  1. Autoregressive model generates visual tokens from text prompt
  2. SANA 1.5 diffusion decoder converts visual tokens to a high-quality image

The GRPO (Group Relative Policy Optimization) RL training improves text rendering in generated images (GenEval 0.73 β†’ 0.90).

πŸš€ Deploy on Hugging Face Inference Endpoints

This model includes a custom handler (handler.py) for deployment on HF Inference Endpoints:

  1. Go to Inference Endpoints
  2. Click "+ New endpoint"
  3. Select fuhaddesmond/illuma as the model repository
  4. Select AWS β†’ NVIDIA T4 ($0.50/hr) or NVIDIA A10G ($1.00/hr)
  5. Set Task to Custom
  6. Click Create Endpoint
  7. Once deployed, call the API:
import requests

API_URL = "https://YOUR_ENDPOINT_ID.aws.endpoints.huggingface.cloud"
headers = {"Authorization": "Bearer hf_YOUR_TOKEN"}

payload = {
    "inputs": "A neon sign that says 'ILLUMA' glowing in purple against a dark wall",
    "parameters": {
        "seq_len": 729,
        "top_p": 0.95,
        "top_k": 1200
    }
}

response = requests.post(API_URL, headers=headers, json=payload)
import base64
from PIL import Image
from io import BytesIO

image_data = base64.b64decode(response.json()["image"])
image = Image.open(BytesIO(image_data))
image.save("illuma_output.png")

πŸ”§ Local Inference

# Clone BLIP3o repo (BLIP3o-NEXT branch)
git clone --branch BLIP3o-NEXT --single-branch https://github.com/JiuhaiChen/BLIP3o.git
cd BLIP3o
pip install -e .

# Download model
python -c "from huggingface_hub import snapshot_download; print(snapshot_download(repo_id='fuhaddesmond/illuma', repo_type='model'))"

# Run inference
python inference.py /path/to/downloaded/model

Download

from huggingface_hub import snapshot_download
snapshot_download(
    repo_id="fuhaddesmond/illuma",
    repo_type="model"
)

Model Details

Detail Value
Base Model BLIP3o-NEXT-GRPO-TexT-3B
Parameters ~4B (3B AR + diffusion decoder)
Architecture Qwen2.5-VL + SANA 1.5
License Apache 2.0
GRPO Training GenEval 0.73 β†’ 0.90
Specialty Text rendering in images

Citation

@article{chen2025blip3,
  title={BLIP3-o: A Family of Fully Open Unified Multimodal Models},
  author={Chen, Jiuhai and others},
  journal={arXiv preprint arXiv:2505.09568},
  year={2025}
}
Downloads last month
1,163
Safetensors
Model size
4B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Space using fuhaddesmond/illuma 1

Paper for fuhaddesmond/illuma