FIBO / README.md

Update README.md

5dceea8 verified about 2 months ago

18.1 kB

	---
	language:
	- en
	base_model:
	- briaai/FIBO
	pipeline_tag: text-to-image
	library_name: diffusers
	extra_gated_description: >-
	Bria AI Model weights are open source for non commercial use only, per the
	provided [license](https://creativecommons.org/licenses/by-nc/4.0/deed.en).
	extra_gated_heading: Fill in this form to immediatly access the model for non commercial use
	extra_gated_fields:
	Name: text
	Email: text
	Company/Org name: text
	Company Website URL: text
	Discord user: text
	I agree to BRIA’s Privacy policy, Terms & conditions, and acknowledge Non commercial use to be Personal use / Academy / Non profit (direct or indirect): checkbox
	license: other
	license_name: bria-fibo
	license_link: https://creativecommons.org/licenses/by-nc/4.0/deed.en
	widget:
	- text: A man holding a goose while screaming
	output:
	url: images/example_69da5vqgf.png
	tags:
	- art
	- text-to-image
	---
	<!-- ===================== HEADER ===================== -->

	<p align="center">
	<img src="https://bria-public.s3.us-east-1.amazonaws.com/Bria-logo.svg" width="200"/>
	</p>

	<p align="center">
	<!-- GitHub Repo -->
	<a href="https://github.com/Bria-AI/FIBO" target="_blank">
	<img
	alt="GitHub Repo"
	src="https://img.shields.io/badge/GitHub-Repo-181717?logo=github&logoColor=white&style=for-the-badge"
	/>
	</a>


	<!-- Hugging Face Demo -->
	<a href="https://huggingface.co/spaces/briaai/FIBO" target="_blank">
	<img
	alt="Hugging Face Demo"
	src="https://img.shields.io/badge/Hugging%20Face-Demo-FFD21E?logo=huggingface&logoColor=black&style=for-the-badge"
	/>
	</a>


	<!-- FIBO Demo on Bria (replace URL if you have a specific demo link) -->
	<a href="https://platform.bria.ai/labs/fibo" target="_blank">
	<img
	alt="FIBO Demo on Bria"
	src="https://img.shields.io/badge/FIBO%20Demo-Bria-6C47FF?style=for-the-badge"
	/>
	</a>


	<!-- Bria Platform -->
	<a href="https://platform.bria.ai" target="_blank">
	<img
	alt="Bria Platform"
	src="https://img.shields.io/badge/Bria-Platform-0EA5E9?style=for-the-badge"
	/>
	</a>


	<!-- Bria Discord -->
	<a href="https://discord.com/invite/Nxe9YW9zHS" target="_blank">
	<img
	alt="Bria Discord"
	src="https://img.shields.io/badge/Discord-Join-5865F2?logo=discord&logoColor=white&style=for-the-badge"
	/>
	</a>


	<!-- Tech Paper -->
	<a href="https://arxiv.org/abs/2511.06876" target="_blank">
	<img
	alt="Tech Paper (Coming Soon)"
	src="https://img.shields.io/badge/Tech%20Paper-lightgrey?logo=arxiv&logoColor=red&style=for-the-badge"
	/>
	</a>
	</p>
	<p align="center">
	<img src="https://bria-public.s3.us-east-1.amazonaws.com/car.001.jpeg" width="1024"/>
	</p>

	<p align="center">
	<b>FIBO is the first open-source, JSON-native text-to-image model trained exclusively on long structred captions.</b>
	<br><br>
	<i>Fibo sets a new standard for controllability, predictability, and disentanglement.</i>
	</p>

	<!-- ===================== MAIN CONTENT ===================== -->

	<h2>🌍 What's FIBO?</h2>
	<p>Most text-to-image models excel at imagination—but not control. <b>FIBO</b> is built for professional workflows, not casual use. Trained on <b>structured JSON captions up to 1,000+ words</b>, FIBO enables precise, reproducible control over lighting, composition, color, and camera settings. The structured captions foster native disentanglement, allowing targeted, iterative refinement without prompt drift. With only <b>8B parameters</b>, FIBO delivers high image quality, strong prompt adherence, and professional-grade control—<b>trained exclusively on licensed data</b>.</p>

	<h2> News</h2>
	<ul>
	<li>2025-11-11: Technical report is now available <a href="https://arxiv.org/abs/2511.06876">here</a> 📓</li>
	<li>2025-11-11: Fine-tuning code is now available <a href="https://github.com/Bria-AI/FIBO/tree/main/src/fine_tuning">here</a> 🎉</li>
	</ul>


	<h2>🔑 Key Features</h2>
	<ul>
	<li><b>VLM guided JSON-native prompting</b>: Incorporates any VLM to transform short prompts into structured schemas with 1,000+ words (lighting, camera, composition, DoF).</li>
	<li><b>Iterative controlled generation</b>: generate images from short prompts or keep refining and get inspiration from detailed JSONs and input images</li>
	<li><b>Disentangled control</b>: tweak a single attribute (e.g., camera angle) without breaking the scene.</li>
	<li><b>Enterprise-grade</b>: 100% licensed data; governance, repeatability, and legal clarity.</li>
	<li><b>Strong prompt adherence</b>: high alignment on PRISM-style evaluations.</li>
	<li><b>Built for production</b>: API endpoints (Bria Platform, Fal.ai, Replicate), ComfyUI nodes, and local inference.</li>
	</ul>

	<h2>🎨 Work with FIBO in Three Simple Modes</h2>

	<ul>
	<li>
	<b>Generate:</b> Start with a quick idea. FIBO’s language model expands your short prompt into a rich, structured JSON prompt, then generates the image.
	You get both the image and the expanded prompt.
	</li>
	<li>
	<b>Refine:</b> Continue from a detailed structured prompt add a short instruction - for example, “backlit,” “85 mm,” or “warmer skin tones.”
	FIBO updates <i>only</i> the requested attributes, re-generates the image, and returns the refined prompt alongside it.
	</li>
	<li>
	<b>Inspire:</b> Provide an image instead of text. FIBO’s vision–language model extracts a detailed, structured prompt, blends it with your creative intent, and produces related images—ideal for inspiration without overreliance on the original.
	</li>
	</ul>

	<h2>⚡ Quick Start</h2>

	</p>

	<!-- change it to a nice button -->
	<p align="center">
	🚀 <a href="https://huggingface.co/spaces/briaai/FIBO">Try FIBO now →</a>
	</p>

	<p>FIBO is available everywhere you build, either as source-code and weights, ComfyUI nodes or API endpoints.</p>

	<p><b>API Endpoint:</b></p>
	<ul>
	<li><a href="https://docs.bria.ai/image-generation/v2-endpoints/image-generate">Bria.ai</a></li>
	<li><a href="https://fal.ai/models/bria/fibo/generate">Fal.ai</a></li>
	<li><a href="https://replicate.com/bria/fibo">Replicate</a></li>
	</ul>

	<p><b>ComfyUI:</b>
	<ul>
	<li><a href="https://github.com/Bria-AI/ComfyUI-BRIA-API/blob/main/nodes/generate_image_node_v2.py">Generate Node</a></li>
	<li><a href="https://github.com/Bria-AI/ComfyUI-BRIA-API/blob/main/nodes/refine_image_node_v2.py">Refine Node</a></li>
	</ul>

	<p><b>Source-Code & Weights</b></p>

	<ul>
	<li>The model is open source for non-commercial use with <a href="https://creativecommons.org/licenses/by-nc/4.0/deed.en">this license</a> </li>
	<li>For commercial use <a href="https://bria.ai/contact-us?hsCtaAttrib=114250296256">Click here</a>.</li>
	</ul>

	<h2>Quick Start Guide</h2>
	<details>
	<summary>Install Diffusers And Additional Requirements</summary>
	<p>Install Diffusers from the source code:</p>
	<pre><code class="language-bash">pip install git+https://github.com/huggingface/diffusers torch torchvision google-genai boltons ujson sentencepiece accelerate transformers
	</code></pre>
	</details>
	<h3>Generate</h3>

	<p>FIBO uses a VLM that transforms short prompts into detailed structured prompts that are used to generate images. You can use the following code to generate images using Gemini via the Google API - requires a GOOGLE_API_KEY, or uncomment the relevant section to run a local VLM instead (FIBO-VLM):</p>

	```python
	import json
	import os

	import torch
	from diffusers import BriaFiboPipeline
	from diffusers.modular_pipelines import ModularPipeline

	# -------------------------------
	# Load the VLM pipeline
	# -------------------------------
	torch.set_grad_enabled(False)
	# Using Gemini API, requires GOOGLE_API_KEY environment variable
	assert os.getenv("GOOGLE_API_KEY") is not None, "GOOGLE_API_KEY environment variable is not set"
	vlm_pipe = ModularPipeline.from_pretrained("briaai/FIBO-gemini-prompt-to-JSON", trust_remote_code=True)

	# Using local VLM, uncomment to run
	# vlm_pipe = ModularPipeline.from_pretrained("briaai/FIBO-VLM-prompt-to-JSON", trust_remote_code=True)


	# Load the FIBO pipeline
	pipe = BriaFiboPipeline.from_pretrained(
	"briaai/FIBO",
	torch_dtype=torch.bfloat16,
	)
	pipe.to("cuda")
	# pipe.enable_model_cpu_offload() # uncomment if you're getting CUDA OOM errors

	# -------------------------------
	# Run Prompt to JSON
	# -------------------------------

	# Create a prompt to generate an initial image
	output = vlm_pipe(
	prompt="A hyper-detailed, ultra-fluffy owl sitting in the trees at night, looking directly at the camera with wide, adorable, expressive eyes. Its feathers are soft and voluminous, catching the cool moonlight with subtle silver highlights. The owl's gaze is curious and full of charm, giving it a whimsical, storybook-like personality."
	)
	json_prompt_generate = output.values["json_prompt"]

	def get_default_negative_prompt(existing_json: dict) -> str:
	negative_prompt = ""
	style_medium = existing_json.get("style_medium", "").lower()
	if style_medium in ["photograph", "photography", "photo"]:
	negative_prompt = """{'style_medium':'digital illustration','artistic_style':'non-realistic'}"""
	return negative_prompt


	negative_prompt = get_default_negative_prompt(json.loads(json_prompt_generate))

	# -------------------------------
	# Run Image Generation
	# -------------------------------
	# Generate the image from the structured json prompt
	results_generate = pipe(
	prompt=json_prompt_generate, num_inference_steps=50, guidance_scale=5, negative_prompt=negative_prompt
	)
	results_generate.images[0].save("image_generate.png")
	with open("image_generate_json_prompt.json", "w") as f:
	f.write(json_prompt_generate)

	```

	<p><img src="https://bria-public.s3.us-east-1.amazonaws.com/owl.png" alt="alt text" width="300"/></p>


	<h3>Refine</h3>
	<p>FIBO supports iterative generation. Given a structured prompt and an instruction, FIBO refines the output.</p>

	```python
	output = vlm_pipe(
	json_prompt=json_prompt_generate, prompt="make the owl brown"
	)
	json_prompt_refine_from_image = output.values["json_prompt"]

	negative_prompt = get_default_negative_prompt(json.loads(json_prompt_refine_from_image))
	results_refine_from_image = pipe(
	prompt=json_prompt_refine_from_image, num_inference_steps=50, guidance_scale=5, negative_prompt=negative_prompt
	)
	results_refine_from_image.images[0].save("image_refine_from_image.png")
	with open("image_refine_from_image_json_prompt.json", "w") as f:
	f.write(json_prompt_refine_from_image)
	```
	<style>
	.image-row {
	display: flex;
	gap: 20px;
	justify-content: center;
	flex-wrap: wrap; /* allows wrapping on smaller screens */
	}
	.image-row figure {
	text-align: center;
	font-size: 0.9em;
	}
	.image-row img {
	width: 300px;
	border-radius: 4px;
	}
	</style>

	<div class="image-row">
	<figure>
	<img src="https://bria-public.s3.us-east-1.amazonaws.com/make_owl_brown.png" alt="Make owl brown"/>
	<figcaption>--> Make the owl brown</figcaption>
	</figure>
	<figure>
	<img src="https://bria-public.s3.us-east-1.amazonaws.com/turn_owl_into_a_lemur_.png" alt="Turn owl into a lemur"/>
	<figcaption>--> Turn the owl into a lemur</figcaption>
	</figure>
	<figure>
	<img src="https://bria-public.s3.us-east-1.amazonaws.com/add_jungle_vegetation_to_the_dark_background.png" alt="Add jungle vegetation"/>
	<figcaption>--> Add jungle vegetation</figcaption>
	</figure>
	<figure>
	<img src="https://bria-public.s3.us-east-1.amazonaws.com/add_sunlight.png" alt="Add sunlight"/>
	<figcaption>--> Add sunlight</figcaption>
	</figure>
	</div>

	<h3>Inspire</h3>
	<p>Start from an image as inspiration and let Fibo regenerate a variation of it or merge your creative intent into the next generation</p>

	```python
	from PIL import Image
	original_astronaut_image = Image.open("<path to original astronaut image>")
	output = vlm_pipe(
	image=original_astronaut_image, prompt="")
	json_prompt_inspire = output.values["json_prompt"]
	negative_prompt = get_default_negative_prompt(json.loads(json_prompt_inspire))
	results_inspire = pipe(
	prompt=json_prompt_inspire, num_inference_steps=50, guidance_scale=5, negative_prompt=negative_prompt
	)
	results_inspire.images[0].save("image_inspire_no_prompt.png")
	with open("image_inspire_json_prompt_no_prompt.json", "w") as f:
	f.write(json_prompt_inspire)

	output = vlm_pipe(
	image=original_astronaut_image, prompt="Make futuristic")
	json_prompt_inspire = output.values["json_prompt"]
	negative_prompt = get_default_negative_prompt(json.loads(json_prompt_inspire))

	results_inspire = pipe(
	prompt=json_prompt_inspire, num_inference_steps=50, guidance_scale=5, negative_prompt=negative_prompt
	)
	results_inspire.images[0].save("image_inspire_with_prompt.png")
	with open("image_inspire_json_prompt_with_prompt.json", "w") as f:
	f.write(json_prompt_inspire)
	```

	<div class="image-row">
	<figure>
	<img src="https://bria-public.s3.us-east-1.amazonaws.com/original.png" alt="original image"/>
	<figcaption>original image</figcaption>
	</figure>
	<figure>
	<img src="https://bria-public.s3.us-east-1.amazonaws.com/no_prompt.png" alt="No prompt"/>
	<figcaption>Inspire #1: No prompt</figcaption>
	</figure>
	<figure>
	<img src="https://bria-public.s3.us-east-1.amazonaws.com/make_futuristic.png" alt="Make futuristic"/>
	<figcaption>Inspire #2: Make futuristic</figcaption>
	</figure>

	</div>

	<h3>Advanced Usage</h3>
	<details>
	<summary>Gemini Setup [optional]</summary>

	<p>FIBO supports any VLM as part of the pipeline. To use Gemini as VLM backbone for FIBO, follow these instructions:</p>

	<ol>
	<li>
	<p><b>Obtain a Gemini API Key</b><br/>
	Sign up for the <a href="https://aistudio.google.com/app/apikey">Google AI Studio (Gemini)</a> and create an API key.</p>
	</li>
	<li>
	<p><b>Set the API Key as an Environment Variable</b><br/>
	Store your Gemini API key in the <code>GEMINI_API_KEY</code> environment variable:</p>
	<pre><code class="language-bash">export GEMINI_API_KEY=your_gemini_api_key
	</code></pre>
	<p>You can add the above line to your <code>.bashrc</code>, <code>.zshrc</code>, or similar shell profile for persistence.</p>
	</li>
	</ol>

	</details>

	<p>see the examples in the <a href="examples">examples</a> directory for more details.</p>

	<h2>🧠 Training and Architecture</h2>

	<p><strong>FIBO</strong> is an 8B-parameter DiT-based, flow-matching text-to-image model trained <strong>exclusively on licensed data</strong> and on <strong>> long, structured JSON captions</strong> (~1,000 words each), enabling strong prompt adherence and professional-grade control. It uses <strong>SmolLM3-3B</strong> as the text encoder with a novel <strong>DimFusion</strong> conditioning architecture for efficient long-caption training, and <strong>Wan 2.2</strong> as the VAE. The structured supervision promotes native disentanglement for targeted, iterative refinement without prompt drift, while VLM-assisted prompting expands short user intents, fills in missing details, and extracts/edits structured prompts from images using our fine-tuned <strong>Qwen-2.5</strong>-based VLM or <strong>Gemini 2.5 Flash</strong>. For reproducibility, we provide the assistant system prompt and the structured-prompt JSON schema across the “Generate,” “Refine,” and “Inspire” modes.</p>

	<h2 id="data-distribution">Data Distribution</h2>

	<p>FIBO was trained on curated set of image–caption pairs selected from ~1B image dataset as shown in the dataset distribution. All assets are vetted for commercial use, attribution traceability, and regional compliance under GDPR and the <strong>EU AI Act</strong>. This broad and balanced dataset ensures FIBO’s ability to generalize across a wide range of visual domains, from realistic human imagery to graphic design and product visualization, while maintaining full licensing compliance.</p>

	<p><img src="https://bria-public.s3.us-east-1.amazonaws.com/DataAttr.png" alt="alt text" width="800"/></p>

	<h2 id="Evaluation">Evaluation</h2>

	<!-- ===================== BENCHMARK TABLE FIGURE ===================== -->
	<h3 id="PRISM Benchmark model-comparison">PRISM Benchmark Model Comparison</h3>

	<p>Using a licensed-data subset of PRISM-Bench, we evaluate image–text alignment and aesthetics. <strong>FIBO</strong> outperforms comparable open-source baselines, suggesting strong prompt adherence, controllability and aesthetics from structured-caption training.</p>

	<img src="https://bria-public.s3.us-east-1.amazonaws.com/Benchmark.png" alt="Benchmark Chart" width="800"/>

	<h2 id="More Samples">More Samples</h2>

	<p>Generate</p>
	<img src="https://bria-public.s3.us-east-1.amazonaws.com/Generate.png" alt="Benchmark Chart" width="800"/>

	<p>Inspire & Refine</p>
	<img src="https://bria-public.s3.us-east-1.amazonaws.com/Refine.ong.png" alt="Benchmark Chart" width="800"/>


	<p>FIBO is inspired by the Fibonacci sequence, where math meets beauty through the golden ratio—nature’s and design’s timeless key to harmony.<p>

	<p>If you have questions about this repository, feedback to share, or want to contribute directly, we welcome your issues and pull requests on GitHub. Your contributions help make FIBO better for everyone.</p>

	<p>If you're passionate about fundamental research, we're hiring full-time employees (FTEs) and research interns. Don't wait - reach out to us at hr@bria.ai</p>

	## Citation

	We kindly encourage citation of our work if you find it useful.

	```bibtex
	@article{gutflaish2025generating,
	title={Generating an Image From 1,000 Words: Enhancing Text-to-Image With Structured Captions},
	author={Gutflaish, Eyal and Kachlon, Eliran and Zisman, Hezi and Hacham, Tal and Sarid, Nimrod and Visheratin, Alexander and Huberman, Saar and Davidi, Gal and Bukchin, Guy and Goldberg, Kfir and others},
	journal={arXiv preprint arXiv:2511.06876},
	year={2025}
	}
	```

	<p align="center"><b>❤️ FIBO model card and ⭐ Star FIBO on GitHub to join the movement for responsible generative AI!</b></p>