Configuration Parsing Warning:Config file tokenizer_config.json cannot be fetched (too big)

🌌 Illuma - Truly Open Source Image Generation

Illuma is an image generation model cloned from Salesforce/BLIP3o-NEXT-GRPO-TexT-3B — the first truly open-source image generation model with:

✅ Training code released (Apache 2.0)
✅ Datasets released (BLIP3o-Pretrain + BLIP3o-60K)
✅ No usage restrictions (Apache 2.0 license)
✅ Can be renamed, rebranded, and refined

Architecture

AR (3B Qwen2.5-VL) + SANA 1.5 Diffusion Decoder

Illuma uses a two-stage generation process:

Autoregressive model generates visual tokens from text prompt
SANA 1.5 diffusion decoder converts visual tokens to a high-quality image

The GRPO (Group Relative Policy Optimization) RL training improves text rendering in generated images (GenEval 0.73 → 0.90).

🚀 Deploy on Hugging Face Inference Endpoints

This model includes a custom handler (handler.py) for deployment on HF Inference Endpoints:

Go to Inference Endpoints
Click "+ New endpoint"
Select fuhaddesmond/illuma as the model repository
Select AWS → NVIDIA T4 ($0.50/hr) or NVIDIA A10G ($1.00/hr)
Set Task to Custom
Click Create Endpoint
Once deployed, call the API:

import requests

API_URL = "https://YOUR_ENDPOINT_ID.aws.endpoints.huggingface.cloud"
headers = {"Authorization": "Bearer hf_YOUR_TOKEN"}

payload = {
    "inputs": "A neon sign that says 'ILLUMA' glowing in purple against a dark wall",
    "parameters": {
        "seq_len": 729,
        "top_p": 0.95,
        "top_k": 1200
    }
}

response = requests.post(API_URL, headers=headers, json=payload)
import base64
from PIL import Image
from io import BytesIO

image_data = base64.b64decode(response.json()["image"])
image = Image.open(BytesIO(image_data))
image.save("illuma_output.png")

🔧 Local Inference

# Clone BLIP3o repo (BLIP3o-NEXT branch)
git clone --branch BLIP3o-NEXT --single-branch https://github.com/JiuhaiChen/BLIP3o.git
cd BLIP3o
pip install -e .

# Download model
python -c "from huggingface_hub import snapshot_download; print(snapshot_download(repo_id='fuhaddesmond/illuma', repo_type='model'))"

# Run inference
python inference.py /path/to/downloaded/model

Download

from huggingface_hub import snapshot_download
snapshot_download(
    repo_id="fuhaddesmond/illuma",
    repo_type="model"
)

Model Details

Detail	Value
Base Model	BLIP3o-NEXT-GRPO-TexT-3B
Parameters	~4B (3B AR + diffusion decoder)
Architecture	Qwen2.5-VL + SANA 1.5
License	Apache 2.0
GRPO Training	GenEval 0.73 → 0.90
Specialty	Text rendering in images

Citation

@article{chen2025blip3,
  title={BLIP3-o: A Family of Fully Open Unified Multimodal Models},
  author={Chen, Jiuhai and others},
  journal={arXiv preprint arXiv:2505.09568},
  year={2025}
}

Downloads last month: 1,163

Safetensors

Model size

4B params

Tensor type

BF16

Space using fuhaddesmond/illuma 1

Paper for fuhaddesmond/illuma

BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset

Paper • 2505.09568 • Published May 14, 2025 • 99