adrianrm commited on
Commit
c6d448d
·
verified ·
1 Parent(s): f517ba0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +108 -3
README.md CHANGED
@@ -1,3 +1,108 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-4.0
3
+ library_name: diffusers
4
+ pipeline_tag: text-to-image
5
+ tags:
6
+ - diffusion
7
+ - text-to-image
8
+ - ambient diffusion
9
+ - low-quality data
10
+ - synthetic data
11
+ ---
12
+
13
+ # Ambient Dataloops: Generative Models for Dataset Refinement
14
+
15
+ ## Model Description
16
+
17
+ Ambient Dataloops is an iterative framework for refining datasets that makes it easier for diffusion models to learn the underlying data distribution. It not only uses uses low-quality, synthetic, and out-of-distribution images to improve the quality of diffusion models, but in turn uses the model to improve the quality of those samples. Just like the other Ambient family approaches, Ambient Dataloops extracts valuable signal from all available images during training, including data typically discarded as "low-quality, unlike traditional approaches that rely on highly curated datasets.
18
+
19
+ This model card is for a text-to-image diffusion model trained on 8-H100 GPUs only. The key innovation over [Ambient Omni](https://huggingface.co/giannisdaras/ambient-o) is the refinement of low-quality synthetic data, previously only used as "noisy" samples.
20
+
21
+ ## Architecture
22
+
23
+ Ambient-o builds upon the [MicroDiffusion](https://github.com/SonyResearch/micro_diffusion) cobebase -- we use a Mixture of Experts Diffusion Transformer totaling ~1.1B parameters.
24
+
25
+ ## Text-to-Image Results
26
+
27
+
28
+ Ambient Dataloops demonstrates improvements in text-to-image generation, compared to the baseline of Ambient Omni which does not refine its low-quality data.
29
+
30
+
31
+ ### Training Data Composition
32
+
33
+ The model was trained on a diverse mixture of datasets:
34
+ - **Conceptual Captions (CC12M)**: 12M image-caption pairs
35
+ - **Segment Anything (SA1B)**: 11.1M high-resolution images with LLaVA-generated captions
36
+ - **JourneyDB**: 4.4M synthetic image-caption pairs from Midjourney
37
+ - **DiffusionDB**: 10.7M quality-filtered synthetic image-caption pairs from Stable Diffusion
38
+
39
+ Data from DiffusionDB were treated as noisy samples, and refined once to obtain the final training set.
40
+
41
+
42
+ ### Technical Approach
43
+
44
+ #### Use synthetic samples
45
+ As a first step, use the Ambient Omni algorithm to train an initial diffusion model, treating the samples from DiffusionDB as noisy i.e. only using them for $\sigma >= 2.0$
46
+
47
+ ### Refine synthetic samples
48
+ Next, we use the trained model to *refine* the synthetic samples, by using posterior samples. These new synthetic samples are better than before, but not as good as real samples. Thus, we still treat them as noisy, but less so than before i.e. only using them for $\sigma >= 1.0$
49
+
50
+ ## Usage
51
+
52
+ ```python
53
+ import torch
54
+ from micro_diffusion.models.model import create_latent_diffusion
55
+ from huggingface_hub import hf_hub_download
56
+ from safetensors import safe_open
57
+
58
+ # Init model
59
+ params = {
60
+ 'latent_res': 64,
61
+ 'in_channels': 4,
62
+ 'pos_interp_scale': 2.0,
63
+ }
64
+ model = create_latent_diffusion(**params).to('cuda')
65
+
66
+ # Download weights from HF
67
+ model_dict_path = hf_hub_download(repo_id="adrianrm/ambient-dataloops", filename="model.safetensors")
68
+ model_dict = {}
69
+ with safe_open(model_dict_path, framework="pt", device="cpu") as f:
70
+ for key in f.keys():
71
+ model_dict[key] = f.get_tensor(key)
72
+
73
+ # Convert parameters to float32 + load
74
+ float_model_params = {
75
+ k: v.to(torch.float32) for k, v in model_dict.items()
76
+ }
77
+ model.dit.load_state_dict(float_model_params)
78
+
79
+ # Eval mode
80
+ model = model.eval()
81
+
82
+ # Generate images
83
+ prompts = [
84
+ "Pirate ship trapped in a cosmic maelstrom nebula, rendered in cosmic beach whirlpool engine, volumet",
85
+ "A illustration from a graphic novel. A bustling city street under the shine of a full moon.",
86
+ "A giant cobra snake made from corn",
87
+ "A fierce garden gnome warrior, clad in armor crafted from leaves and bark, brandishes a tiny sword.",
88
+ "A capybara made of lego sitting in a realistic, natural field",
89
+ "a close-up of a fire spitting dragon, cinematic shot.",
90
+ "Panda mad scientist mixing sparkling chemicals, artstation",
91
+ "the sailor galaxia. beautiful, realistic painting by mucha and kuvshinov and bilibin. watercolor, thick lining, manga, soviet realism",
92
+ ]
93
+ images = model.generate(prompt=prompts, num_inference_steps=30, guidance_scale=5.0, seed=42)
94
+ ```
95
+
96
+ ## Citation
97
+
98
+ ```bibtex
99
+ @article{rodriguez2025ambient,
100
+ title = {Ambient Dataloops: Generative Models for Dataset Refinement},
101
+ author = {Rodriguez-Munoz, A. and Daspit, W. and Klivans, A. and Torralba, A. and Daskalakis, C. and Daras, G.},
102
+ year = {2025},
103
+ }
104
+ ```
105
+
106
+ ## License
107
+
108
+ The model follows the [license](https://github.com/SonyResearch/micro_diffusion/blob/main/LICENSE) of the MicroDiffusion repo.