Instructions to use zai-org/glm-4v-9b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use zai-org/glm-4v-9b with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("zai-org/glm-4v-9b", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
How to run the Int4 quantized model?
#10
by CharlesLincoln - opened
Same as the title
CharlesLincoln changed discussion title from How to run the 4-bit quantized model? to How to run the INT4 quantized model?
CharlesLincoln changed discussion title from How to run the INT4 quantized model? to How to run the Int4 quantized model?
from github code in basic_demo/trans_cli_vision_demo.py uncomment the block:
#model = AutoModel.from_pretrained(
# MODEL_PATH,
# trust_remote_code=True,
# # attn_implementation="flash_attention_2", # Use Flash Attention
# torch_dtype=torch.bfloat16,
# device_map="auto",
#).eval()
## For INT4 inference
model = AutoModel.from_pretrained(
MODEL_PATH,
trust_remote_code=True,
quantization_config=BitsAndBytesConfig(load_in_4bit=True),
torch_dtype=torch.bfloat16,
low_cpu_mem_usage=True
).eval()
Note that you can also quantize the model yourself and run using VLLM using this branch and examples