zRzRzRzRzRzRzR commited on
Commit
2c433cc
·
1 Parent(s): a32e429
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -165,7 +165,7 @@ curl -s -X POST "http://localhost:30000/v1/images/edits" \
165
  + Please ensure that all text intended to be rendered in the image is enclosed in quotation marks in the model input and We strongly recommend using GLM-4.7 to enhance prompts for higher image quality. Please check [our github script](https://raw.githubusercontent.com/zai-org/GLM-Image/refs/heads/main/examples/prompt_utils.py) for more details.
166
  + The AR model used in GLM‑Image is configured with `do_sample=True`, a temperature of `0.9`, and a topp of `0.75` by default. A higher temperature results in more diverse and rich outputs, but it can also lead to a certain decrease in output stability.
167
  + The target image resolution must be divisible by 32. Otherwise, it will throw an error.
168
- + Because the inference optimizations for this architecture are currently limited, the runtime cost is still relatively high. It requires either a single GPU with more than 80GB of memory, or a multi-GPU setup.
169
  + vLLM-Omni and SGLang (with AR speedup) support is currently being integrated — stay tuned. For inference cost, you can check in our github.
170
 
171
  ## Model Performance
 
165
  + Please ensure that all text intended to be rendered in the image is enclosed in quotation marks in the model input and We strongly recommend using GLM-4.7 to enhance prompts for higher image quality. Please check [our github script](https://raw.githubusercontent.com/zai-org/GLM-Image/refs/heads/main/examples/prompt_utils.py) for more details.
166
  + The AR model used in GLM‑Image is configured with `do_sample=True`, a temperature of `0.9`, and a topp of `0.75` by default. A higher temperature results in more diverse and rich outputs, but it can also lead to a certain decrease in output stability.
167
  + The target image resolution must be divisible by 32. Otherwise, it will throw an error.
168
+ + Because inference optimizations for this architecture are currently limited, the runtime cost is still relatively high. You can set `enable_model_cpu_offload=True` to run it with `~23GB` of GPU memory, at the cost of slower inference.
169
  + vLLM-Omni and SGLang (with AR speedup) support is currently being integrated — stay tuned. For inference cost, you can check in our github.
170
 
171
  ## Model Performance