ZHANGYUXUAN-zR commited on
Commit
4af2732
·
verified ·
1 Parent(s): 1910a8d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +23 -25
README.md CHANGED
@@ -7,44 +7,28 @@ library_name: diffusers
7
  pipeline_tag: text-to-image
8
  ---
9
 
10
- # GLM-Image
11
-
12
- <div align="center">
13
- <img src=https://raw.githubusercontent.com/zai-org/GLM-Image/refs/heads/main/resources/logo.svg width="40%"/>
14
- </div>
15
  <p align="center">
16
- 👋 Join our <a href="https://raw.githubusercontent.com/zai-org/GLM-Image/refs/heads/main/resources/WECHAT.md" target="_blank">WeChat</a> and <a href="https://discord.gg/8KFjEec7" target="_blank">Discord</a> community
17
- <br>
18
- 📖 Check out GLM-Image's <a href="https://z.ai/blog/glm-image" target="_blank">Technical Blog</a>
19
- <br>
20
- 📍 Use GLM-Image's <a href="https://docs.z.ai/guides/image/glm-image" target="_blank">API</a>
21
  </p>
22
 
23
-
24
- ## Case
25
-
26
- ![show_case](https://raw.githubusercontent.com/zai-org/GLM-Image/refs/heads/main/resources/show_case.jpeg)
27
-
28
- ### T2I with dense text and knowledge
29
-
30
- ![show_case](https://raw.githubusercontent.com/zai-org/GLM-Image/refs/heads/main/resources/show_case_t2i.jpeg)
31
-
32
- ### I2I
33
-
34
- ![show_case](https://raw.githubusercontent.com/zai-org/GLM-Image/refs/heads/main/resources/show_case_i2i.jpeg)
35
-
36
-
37
  ## Introduction
38
 
39
  GLM-Image is an image generation model adopts a hybrid autoregressive + diffusion decoder architecture. In general image generation quality, GLM‑Image aligns with mainstream latent diffusion approaches, but it shows significant advantages in text-rendering and knowledge‑intensive generation scenarios. It performs especially well in tasks requiring precise semantic understanding and complex information expression, while maintaining strong capabilities in high‑fidelity and fine‑grained detail generation. In addition to text‑to‑image generation, GLM‑Image also supports a rich set of image‑to‑image tasks including image editing, style transfer, identity‑preserving generation, and multi‑subject consistency.
40
 
41
  Model architecture: a hybrid autoregressive + diffusion decoder design.
42
 
43
- ![architecture](https://raw.githubusercontent.com/zai-org/GLM-Image/refs/heads/main/resources/architecture.jpeg)
 
 
44
 
45
  + Autoregressive generator: a 9B-parameter model initialized from [GLM-4-9B-0414](https://huggingface.co/zai-org/GLM-4-9B-0414), with an expanded vocabulary to incorporate visual tokens. The model first generates a compact encoding of approximately 256 tokens, then expands to 1K–4K tokens, corresponding to 1K–2K high-resolution image outputs.
46
  + Diffusion Decoder: a 7B-parameter decoder based on a single-stream DiT architecture for latent-space image decoding. It is equipped with a Glyph Encoder text module, significantly improving accurate text rendering within images.
47
 
 
 
 
 
48
  Post-training with decoupled reinforcement learning: the model introduces a fine-grained, modular feedback strategy using the GRPO algorithm, substantially enhancing both semantic understanding and visual detail quality.
49
 
50
  + Autoregressive module: provides low-frequency feedback signals focused on aesthetics and semantic alignment, improving instruction following and artistic expressiveness.
@@ -55,6 +39,20 @@ GLM-Image supports both text-to-image and image-to-image generation within a sin
55
  + Text-to-image: generates high-detail images from textual descriptions, with particularly strong performance in information-dense scenarios.
56
  + Image-to-image: supports a wide range of tasks, including image editing, style transfer, multi-subject consistency, and identity-preserving generation for people and objects.
57
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
58
  ## Quick Start
59
 
60
  ### transformers + diffusers Pipeline
 
7
  pipeline_tag: text-to-image
8
  ---
9
 
10
+ ![show_case](resources/show_case.jpeg)
 
 
 
 
11
  <p align="center">
12
+ <img src="https://raw.githubusercontent.com/zai-org/GLM-Image/refs/heads/main/resources/show_case.jpeg" alt="show_case" width="100%" />
 
 
 
 
13
  </p>
14
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
  ## Introduction
16
 
17
  GLM-Image is an image generation model adopts a hybrid autoregressive + diffusion decoder architecture. In general image generation quality, GLM‑Image aligns with mainstream latent diffusion approaches, but it shows significant advantages in text-rendering and knowledge‑intensive generation scenarios. It performs especially well in tasks requiring precise semantic understanding and complex information expression, while maintaining strong capabilities in high‑fidelity and fine‑grained detail generation. In addition to text‑to‑image generation, GLM‑Image also supports a rich set of image‑to‑image tasks including image editing, style transfer, identity‑preserving generation, and multi‑subject consistency.
18
 
19
  Model architecture: a hybrid autoregressive + diffusion decoder design.
20
 
21
+ <p align="center">
22
+ <img src="https://raw.githubusercontent.com/zai-org/GLM-Image/refs/heads/main/resources/architecture_1.jpeg" alt="architecture_1" width="100%" />
23
+ </p>
24
 
25
  + Autoregressive generator: a 9B-parameter model initialized from [GLM-4-9B-0414](https://huggingface.co/zai-org/GLM-4-9B-0414), with an expanded vocabulary to incorporate visual tokens. The model first generates a compact encoding of approximately 256 tokens, then expands to 1K–4K tokens, corresponding to 1K–2K high-resolution image outputs.
26
  + Diffusion Decoder: a 7B-parameter decoder based on a single-stream DiT architecture for latent-space image decoding. It is equipped with a Glyph Encoder text module, significantly improving accurate text rendering within images.
27
 
28
+ <p align="center">
29
+ <img src="https://raw.githubusercontent.com/zai-org/GLM-Image/refs/heads/main/resources/architecture_2.jpeg" alt="architecture_2" width="70%" />
30
+ </p>
31
+
32
  Post-training with decoupled reinforcement learning: the model introduces a fine-grained, modular feedback strategy using the GRPO algorithm, substantially enhancing both semantic understanding and visual detail quality.
33
 
34
  + Autoregressive module: provides low-frequency feedback signals focused on aesthetics and semantic alignment, improving instruction following and artistic expressiveness.
 
39
  + Text-to-image: generates high-detail images from textual descriptions, with particularly strong performance in information-dense scenarios.
40
  + Image-to-image: supports a wide range of tasks, including image editing, style transfer, multi-subject consistency, and identity-preserving generation for people and objects.
41
 
42
+ ## Showcase
43
+
44
+ ### T2I with dense text and knowledge
45
+
46
+ <p align="center">
47
+ <img src="https://raw.githubusercontent.com/zai-org/GLM-Image/refs/heads/main/resources/show_case_t2i.jpeg" alt="show_case_t2i" width="100%" />
48
+ </p>
49
+
50
+ ### I2I
51
+
52
+ <p align="center">
53
+ <img src="https://raw.githubusercontent.com/zai-org/GLM-Image/refs/heads/main/resources/show_case_i2i.jpeg" alt="show_case_i2i" width="100%" />
54
+ </p>
55
+
56
  ## Quick Start
57
 
58
  ### transformers + diffusers Pipeline