llm.create_chat_completion(
messages = [
{
"role": "user",
"content": "What is the capital of France?"
}
]
)LongWriter-glm4-9b
Original model link: https://huggingface.co/THUDM/LongWriter-glm4-9b
Model by: THUDM
Quants by: QuantPanda
GGUF quantization for llama.cpp and similar applications.
Example:
./llama-cli -m LongWriter-glm4-9B-Q5_K_M.gguf -p "You are a helpful AI assistant." --conversation
If the model takes too long to load you can reduce the context size with --ctx-size
Example with smaller context size:
./llama-cli -m LongWriter-glm4-9B-Q5_K_M.gguf -p "You are a helpful AI assistant." --conversation --ctx-size 4096
- Downloads last month
- 72
Hardware compatibility
Log In to add your hardware
3-bit
4-bit
5-bit
6-bit
8-bit
32-bit
Model tree for QuantPanda/LongWriter-glm4-9B-GGUF
Base model
zai-org/LongWriter-glm4-9b
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="QuantPanda/LongWriter-glm4-9B-GGUF", filename="", )