rootxhacker commited on
Commit
0870869
·
verified ·
1 Parent(s): aedeb9d

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +31 -0
README.md ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ pipeline_tag: text-to-image
4
+ library_name: safetensors
5
+ tags: [hobbylm, text-to-image, diffusion, dit, flow-matching]
6
+ ---
7
+
8
+ # HobbyLM-Image (1024px text-to-image DiT)
9
+
10
+ An in-context latent **flow-matching DiT** that generates 1024×1024 images, trained on a $300-class budget.
11
+ It operates in the **DC-AE f32c32 (SANA-1.1)** latent space and is conditioned on **CLIP-L** text features.
12
+
13
+ ## Components (frozen, not included)
14
+
15
+ - VAE: `mit-han-lab/dc-ae-f32c32-sana-1.1-diffusers` (32× spatial compression → 32×32×32 latent at 1024px).
16
+ - Text encoder: `openai/clip-vit-large-patch14`.
17
+
18
+ ## Files
19
+ - `model.safetensors` — the DiT weights. `config.json` — DiT config, `lat_std`, VAE `scaling_factor`.
20
+
21
+ ## Pipeline (sketch)
22
+ Encode the text prompt with CLIP-L → start from Gaussian latent noise → run the DiT's rectified-flow / CFG
23
+ sampler for ~100 steps → decode the latent with the DC-AE VAE → 1024px image. (No GGUF: image-gen DiTs have
24
+ no standard GGUF runtime.)
25
+
26
+ ## Capabilities
27
+ Watermark-free; accurate objects; cinematic scenes; usable single-person portraits. Soft on hands /
28
+ multi-person (the small-model ceiling). Editing is available in a sibling 512px checkpoint.
29
+
30
+ ## License
31
+ Apache-2.0.