adrianrm
/

ambient-dataloops

+---
+license: cc-by-nc-4.0
+library_name: diffusers
+pipeline_tag: text-to-image
+tags:
+- diffusion
+- text-to-image
+- ambient diffusion
+- low-quality data
+- synthetic data
+---
+# Ambient Dataloops: Generative Models for Dataset Refinement
+## Model Description
+Ambient Dataloops is an iterative framework for refining datasets that makes it easier for diffusion models to learn the underlying data distribution. It not only uses uses low-quality, synthetic, and out-of-distribution images to improve the quality of diffusion models, but in turn uses the model to improve the quality of those samples. Just like the other Ambient family approaches, Ambient Dataloops extracts valuable signal from all available images during training, including data typically discarded as "low-quality, unlike traditional approaches that rely on highly curated datasets.
+This model card is for a text-to-image diffusion model trained on 8-H100 GPUs only. The key innovation over [Ambient Omni](https://huggingface.co/giannisdaras/ambient-o) is the refinement of low-quality synthetic data, previously only used as "noisy" samples.
+## Architecture
+Ambient-o builds upon the [MicroDiffusion](https://github.com/SonyResearch/micro_diffusion) cobebase -- we use a Mixture of Experts Diffusion Transformer totaling ~1.1B parameters.
+## Text-to-Image Results
+Ambient Dataloops demonstrates improvements in text-to-image generation, compared to the baseline of Ambient Omni which does not refine its low-quality data.
+### Training Data Composition
+The model was trained on a diverse mixture of datasets:
+- **Conceptual Captions (CC12M)**: 12M image-caption pairs
+- **Segment Anything (SA1B)**: 11.1M high-resolution images with LLaVA-generated captions
+- **JourneyDB**: 4.4M synthetic image-caption pairs from Midjourney
+- **DiffusionDB**: 10.7M quality-filtered synthetic image-caption pairs from Stable Diffusion
+Data from DiffusionDB were treated as noisy samples, and refined once to obtain the final training set.
+### Technical Approach
+#### Use synthetic samples
+As a first step, use the Ambient Omni algorithm to train an initial diffusion model, treating the samples from DiffusionDB as noisy i.e. only using them for $\sigma >= 2.0$
+### Refine synthetic samples
+Next, we use the trained model to *refine* the synthetic samples, by using posterior samples. These new synthetic samples are better than before, but not as good as real samples. Thus, we still treat them as noisy, but less so than before i.e. only using them for $\sigma >= 1.0$
+## Usage
+```python
+import torch
+from micro_diffusion.models.model import create_latent_diffusion
+from huggingface_hub import hf_hub_download
+from safetensors import safe_open
+# Init model
+params = {
+    'latent_res': 64,
+    'in_channels': 4,
+    'pos_interp_scale': 2.0,
+}
+model = create_latent_diffusion(**params).to('cuda')
+# Download weights from HF
+model_dict_path = hf_hub_download(repo_id="adrianrm/ambient-dataloops", filename="model.safetensors")
+model_dict = {}
+with safe_open(model_dict_path, framework="pt", device="cpu") as f:
+   for key in f.keys():
+       model_dict[key] = f.get_tensor(key)
+# Convert parameters to float32 + load
+float_model_params = {
+    k: v.to(torch.float32) for k, v in model_dict.items()
+}
+model.dit.load_state_dict(float_model_params)
+# Eval mode
+model = model.eval()
+# Generate images
+prompts = [
+    "Pirate ship trapped in a cosmic maelstrom nebula, rendered in cosmic beach whirlpool engine, volumet",
+    "A illustration from a graphic novel. A bustling city street under the shine of a full moon.",
+    "A giant cobra snake made from corn",
+    "A fierce garden gnome warrior, clad in armor crafted from leaves and bark, brandishes a tiny sword.",
+    "A capybara made of lego sitting in a realistic, natural field",
+    "a close-up of a fire spitting dragon, cinematic shot.",
+    "Panda mad scientist mixing sparkling chemicals, artstation",
+    "the sailor galaxia. beautiful, realistic painting by mucha and kuvshinov and bilibin. watercolor, thick lining, manga, soviet realism",
+]
+images = model.generate(prompt=prompts, num_inference_steps=30, guidance_scale=5.0, seed=42)
+```
+## Citation
+```bibtex
+@article{rodriguez2025ambient,
+  title = {Ambient Dataloops: Generative Models for Dataset Refinement},
+  author = {Rodriguez-Munoz, A. and Daspit, W. and Klivans, A. and Torralba, A. and Daskalakis, C. and Daras, G.},
+  year = {2025},
+}
+```
+## License
+The model follows the [license](https://github.com/SonyResearch/micro_diffusion/blob/main/LICENSE) of the MicroDiffusion repo.