zai-org
/

GLM-Image

GlmImagePipeline

Model card Files Files and versions

zRzRzRzRzRzRzR commited on Jan 15

Commit

2c433cc

·

1 Parent(s): a32e429

offlaod

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -165,7 +165,7 @@ curl -s -X POST "http://localhost:30000/v1/images/edits" \
 + Please ensure that all text intended to be rendered in the image is enclosed in quotation marks in the model input and We strongly recommend using GLM-4.7 to enhance prompts for higher image quality. Please check [our github script](https://raw.githubusercontent.com/zai-org/GLM-Image/refs/heads/main/examples/prompt_utils.py) for more details.
 + The AR model used in GLM‑Image is configured with `do_sample=True`, a temperature of `0.9`, and a topp of `0.75` by default. A higher temperature results in more diverse and rich outputs, but it can also lead to a certain decrease in output stability.
 + The target image resolution must be divisible by 32. Otherwise, it will throw an error.
-+ Because the inference optimizations for this architecture are currently limited, the runtime cost is still relatively high. It requires either a single GPU with more than 80GB of memory, or a multi-GPU setup.
 + vLLM-Omni and SGLang (with AR speedup) support is currently being integrated — stay tuned. For inference cost, you can check in our github.
 ## Model Performance

 + Please ensure that all text intended to be rendered in the image is enclosed in quotation marks in the model input and We strongly recommend using GLM-4.7 to enhance prompts for higher image quality. Please check [our github script](https://raw.githubusercontent.com/zai-org/GLM-Image/refs/heads/main/examples/prompt_utils.py) for more details.
 + The AR model used in GLM‑Image is configured with `do_sample=True`, a temperature of `0.9`, and a topp of `0.75` by default. A higher temperature results in more diverse and rich outputs, but it can also lead to a certain decrease in output stability.
 + The target image resolution must be divisible by 32. Otherwise, it will throw an error.
++ Because inference optimizations for this architecture are currently limited, the runtime cost is still relatively high. You can set `enable_model_cpu_offload=True` to run it with `~23GB` of GPU memory, at the cost of slower inference.
 + vLLM-Omni and SGLang (with AR speedup) support is currently being integrated — stay tuned. For inference cost, you can check in our github.
 ## Model Performance