Instructions to use zai-org/chatglm-6b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use zai-org/chatglm-6b with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("zai-org/chatglm-6b", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
Can I specify the number of threads used in CPU reasoning?
#13
by byzp - opened
CPU reasoning seems to use half the number of kernel threads by default. Can I improve it to get faster speed?
Of course you can.
Pass parallel_num=your_threads_num to quantize() when quantizing.
Or if your model has already been loaded, call quantize() again, and reset the cpu core number used in quantization:
model = AutoModel.from_pretrained("THUDM/chatglm-6b-int4", trust_remote_code=True).cpu().float()
model = model.quantize(bits=4, parallel_num=your_threads_num)
However, inappropriate parallel_num can harm efficiency, it is not recommended to exceed the number of cores.