FIBO / README.md

Update README.md

b8b35bc verified 4 months ago

17.6 kB

language:
  - en
base_model:
  - briaai/FIBO
pipeline_tag: text-to-image
library_name: diffusers
license: other
license_name: bria-fibo
license_link: https://creativecommons.org/licenses/by-nc/4.0/deed.en
extra_gated_description: >-
  Bria AI Model weights are open source for non commercial use only, per the
  provided [license](https://creativecommons.org/licenses/by-nc/4.0/deed.en).
extra_gated_heading: Fill in this form to immediatly access the model for non commercial use
extra_gated_fields:
  Name: text
  Email: text
  Company/Org name: text
  Company Website URL: text
  Discord user: text
  I agree to BRIA’s Privacy policy, Terms & conditions, and acknowledge Non commercial use to be Personal use / Academy / Non profit (direct or indirect): checkbox

FIBO is the first open-source, JSON-native text-to-image model trained exclusively on long structred captions.

Fibo sets a new standard for controllability, predictability, and disentanglement.

🌍 What's FIBO?

Most text-to-image models excel at imagination—but not control. FIBO is built for professional workflows, not casual use. Trained on structured JSON captions up to 1,000+ words, FIBO enables precise, reproducible control over lighting, composition, color, and camera settings. The structured captions foster native disentanglement, allowing targeted, iterative refinement without prompt drift. With only 8B parameters, FIBO delivers high image quality, strong prompt adherence, and professional-grade control—trained exclusively on licensed data.

🔑 Key Features

VLM guided JSON-native prompting: Incorporates any VLM to transform short prompts into structured schemas with 1,000+ words (lighting, camera, composition, DoF).
Iterative controlled generation: generate images from short prompts or keep refining and get inspiration from detailed JSONs and input images
Disentangled control: tweak a single attribute (e.g., camera angle) without breaking the scene.
Enterprise-grade: 100% licensed data; governance, repeatability, and legal clarity.
Strong prompt adherence: high alignment on PRISM-style evaluations.
Built for production: API endpoints (Bria Platform, Fal.ai, Replicate), ComfyUI nodes, and local inference.

🎨 Work with FIBO in Three Simple Modes

Generate: Start with a quick idea. FIBO’s language model expands your short prompt into a rich, structured JSON prompt, then generates the image. You get both the image and the expanded prompt.
Refine: Continue from a detailed structured prompt add a short instruction - for example, “backlit,” “85 mm,” or “warmer skin tones.” FIBO updates only the requested attributes, re-generates the image, and returns the refined prompt alongside it.
Inspire: Provide an image instead of text. FIBO’s vision–language model extracts a detailed, structured prompt, blends it with your creative intent, and produces related images—ideal for inspiration without overreliance on the original.

⚡ Quick Start

🚀 Try FIBO now →

FIBO is available everywhere you build, either as source-code and weights, ComfyUI nodes or API endpoints.

API Endpoint:

ComfyUI: Use it in workflows (Soon)

Source-Code & Weights

The model is open source for non-commercial use with this license
For commercial use Click here.

Quick Start Guide

Install Diffusers

Install Diffusers from the source code:

pip install git+https://github.com/huggingface/diffusers

Generate

FIBO is using transforming short prompts into detailed structured prompts that are used to generate images. You can use the following code to generate images using a local VLM (FIBO-VLM):

import json
import torch
from diffusers import  BriaFiboPipeline, BriaFiboVLMPromptToJson

# -------------------------------
# Section: Initialization
# -------------------------------
torch.set_grad_enabled(False)
vlm_block = BriaFiboVLMPromptToJson(model_id="briaai/vlm-processor-new")
vlm_pipe = vlm_block.init_pipeline()

pipe = BriaFiboPipeline.from_pretrained(
    "briaai/GAIA-Alpha-diffusers",
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
)
pipe.to("cuda")

# -------------------------------
# Section: Image Generation
# -------------------------------

# 1. Create a prompt to generate an initial image
output = vlm_pipe(prompt="A hyper-detailed, ultra-fluffy owl sitting in the trees at night, looking directly at the camera with wide, adorable, expressive eyes. Its feathers are soft and voluminous, catching the cool moonlight with subtle silver highlights. The owl's gaze is curious and full of charm, giving it a whimsical, storybook-like personality.")
json_prompt_generate = output.values["json_prompt"]

# Generate the image from the structured json prompt
results_generate = pipe(
    prompt=json_prompt_generate, num_inference_steps=50, guidance_scale=5
)
results_generate.images[0].save("image_generate.png")
with open("image_generate_json_prompt.json", "w") as f:
    json.dump(json_prompt_generate, f)

alt text

Refine

FIBO supports iterative generation. Given a structured prompt and an instruction, FIBO refines the output.

output = vlm_pipe(
    json_prompt=json_prompt_generate, prompt="make the owl brown"
)
json_prompt_refine_from_image = output.values["json_prompt"]

results_refine_from_image = pipe(
    prompt=json_prompt_refine_from_image, num_inference_steps=50, guidance_scale=5
)
results_refine_from_image.images[0].save("image_refine_from_image.png")
with open("image_refine_from_image_json_prompt.json", "w") as f:
    json.dump(json_prompt_refine_from_image, f)

Turn owl into a lemur — --> Turn the owl into a lemur

Inspire

Start from an image as inspiration and let Fibo regenerate a variation of it or merge your creative intent into the next generation

output = vlm_pipe(
    image=original_astronaut_image, prompt="")
json_prompt_inspire = output.values["json_prompt"]

results_inspire = pipe(
    prompt=json_prompt_inspire, num_inference_steps=50, guidance_scale=5
)
results_inspire.images[0].save("image_inspire.png")
with open("image_inspire_json_prompt.json", "w") as f:
    json.dump(json_prompt_inspire, f)

output = vlm_pipe(
    image=original_astronaut_image, prompt="Add green to the helmet color")
json_prompt_inspire = output.values["json_prompt"]

results_inspire = pipe(
    prompt=json_prompt_inspire, num_inference_steps=50, guidance_scale=5
)
results_inspire.images[0].save("image_inspire.png")
with open("image_inspire_json_prompt.json", "w") as f:
    json.dump(json_prompt_inspire, f)

Inspire #2: Add green to the helmet color

Advanced Usage

Gemini Setup [optional]

FIBO supports any VLM as part of the pipeline. To use Gemini as VLM backbone for FIBO, follow these instructions:

Obtain a Gemini API Key
Sign up for the Google AI Studio (Gemini) and create an API key.
Set the API Key as an Environment Variable
Store your Gemini API key in the GEMINI_API_KEY environment variable:
```
export GEMINI_API_KEY=your_gemini_api_key
```
You can add the above line to your .bashrc, .zshrc, or similar shell profile for persistence.

see the examples in the examples directory for more details.

🧠 Training and Architecture

FIBO is an 8B-parameter DiT-based, flow-matching text-to-image model trained exclusively on licensed data and on >100M long, structured JSON captions (~1,000 words each), enabling strong prompt adherence and professional-grade control. It uses SmolLM3-3B as the text encoder with a novel DimFusion conditioning architecture for efficient long-caption training, and Wan 2.2 as the VAE. The structured supervision promotes native disentanglement for targeted, iterative refinement without prompt drift, while VLM-assisted prompting expands short user intents, fills in missing details, and extracts/edits structured prompts from images using our fine-tuned Qwen-2.5-based VLM or Gemini 2.5 Flash. For reproducibility, we provide the assistant system prompt and the structured-prompt JSON schema across the “Generate,” “Refine,” and “Inspire” modes.

Data Distribution

FIBO was trained on over 100M licensed image–caption pairs as shown in the dataset distribution. All assets are vetted for commercial use, attribution traceability, and regional compliance under GDPR and the EU AI Act. This broad and balanced dataset ensures FIBO’s ability to generalize across a wide range of visual domains, from realistic human imagery to graphic design and product visualization, while maintaining full licensing compliance.

alt text

Evaluation

PRISM Benchmark Model Comparison

Using a licensed-data subset of PRISM-Bench, we evaluate image–text alignment and aesthetics. FIBO outperforms comparable open-source baselines, suggesting strong prompt adherence, controllability and aesthetics from structured-caption training.

#	Model (Size)	Text-Alignment	Aesthetics
1	FIBO (8B)	87.8	82.1
2	Qwen-Image (20B)	84.1	81.5
3	FLUX.1-Krea-dev (12B)	79.7	79.7
4	HiDream-I1-Full (17B)	80.0	79.1
5	FLUX.1-dev (12B)	77.0	78.7
6	SD3.5-Large (12B)	77.9	77.6

Prism Benchmark

alt text

If you have questions about this repository, feedback to share, or want to contribute directly, we welcome your issues and pull requests on GitHub. Your contributions help make FIBO better for everyone.

If you're passionate about fundamental research, we're hiring full-time employees (FTEs) and research interns. Don't wait - reach out to us at hr@bria.ai

⭐ Star FIBO on GitHub and join the movement for responsible generative AI!

briaai
/

FIBO

🌍 What's FIBO?

🔑 Key Features

🎨 Work with FIBO in Three Simple Modes

⚡ Quick Start

Quick Start Guide

Generate

Refine

Inspire

Advanced Usage

🧠 Training and Architecture

Data Distribution

Evaluation

PRISM Benchmark Model Comparison

Benchmark Table

Prism Benchmark