Instructions to use StableQuant/Qwen-Templates-Rebuild-Project with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use StableQuant/Qwen-Templates-Rebuild-Project with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="StableQuant/Qwen-Templates-Rebuild-Project")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("StableQuant/Qwen-Templates-Rebuild-Project", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use StableQuant/Qwen-Templates-Rebuild-Project with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "StableQuant/Qwen-Templates-Rebuild-Project" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "StableQuant/Qwen-Templates-Rebuild-Project", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/StableQuant/Qwen-Templates-Rebuild-Project
- SGLang
How to use StableQuant/Qwen-Templates-Rebuild-Project with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "StableQuant/Qwen-Templates-Rebuild-Project" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "StableQuant/Qwen-Templates-Rebuild-Project", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "StableQuant/Qwen-Templates-Rebuild-Project" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "StableQuant/Qwen-Templates-Rebuild-Project", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use StableQuant/Qwen-Templates-Rebuild-Project with Docker Model Runner:
docker model run hf.co/StableQuant/Qwen-Templates-Rebuild-Project
Hermes Tool Loops in v1.1.5
This is the discussion about the Tool Loops that occur in v1.1.5 when using Hermes. OpenHands and OpenCode seem to be unaffected. I think its a Harnessspecific thing. v.1.1.5 uses a robust tool-call error reccovery logic, it shouldnt happen but since it still does, I will look into this manually (installing Hermes myself and test)
So far, reported was tool calling loops and also cron tool call loops.
I was testing in hermes. With 1.1.5 hermes cannot write code to local file - timeouts 100% (15 tries out of 15). It can generate and display code, but probably cannot use some specific tools like write file or so.
Streaming, reasoning does not help (in logs - LLM finish task in 12s, hermes timeouts after 300s so clearly response lost somewhere)
I am using this with Opencode and latest VLLM and Qwen3.6 27b and noticing that sometimes it stops abruptly.
When it does from the text it looks its about to call some tool but it never does and just stops instead ...
@ABLomas
Did you noticed this on specific code or on random files happening?
I ask because I discovered an error yesterday myself, it coded fine for hours in OpenHands but when using a certain code part it becames stale. Probably the same reason.
Could you tell me if you used other templates sucessful managing this part, like froggeric v16 for example or did it happens with every template?
Ok, i spun up my own Hermes instance now. The tool calling is indeed totally broken with current template version, independed of editing code files. Im working on this now. No further information needed (but you can still post it if you want).
My setup:--host 0.0.0.0 -fa 1 --fit-ctx 262144 --min-p 0.0 --fit 1 -b 2048 -ub 512 --no-mmap -ctk q8_0 -ctv q8_0 --jinja -m Qwen3.6-35B-A3B-IQ4_XS-4.15bpw.gguf --temp 0.6 --top-p 0.95 --top-k 20 --presence-penalty 0.0 --repeat-penalty 1.0 --chat-template-kwargs "{\"preserve_thinking\":true}" --no-mmproj -np 1 --alias Qwen3.6-35B-A3B-UD-Q4_K_XL.gguf --reasoning-budget 4096 --metrics --reasoning-budget-message "[SYSTEM ALERT: Reasoning budget exceeded. I am stuck in a loop or overcomplicating. I must stop IMMEDIATELY and use the ask_followup_question tool to notify the user and ask for guidance.]" --chat-template-file qwen3.6_chat_template.txt -to 900
Model - https://huggingface.co/byteshape/Qwen3.6-35B-A3B-GGUF/blob/main/Qwen3.6-35B-A3B-IQ4_XS-4.15bpw.gguf
Hermes - v0.14.0 (2026.5.16)
Yesterday I tried installing version v1.1.5 on the Pi coding agent, tool seemed to be calling incorrectly. I don't know why. 😁
I had to revert to froggeric's v19
@StableQuant i just wanna say that froggeric's v19 template is the best for Hermes, v16 has loops as well.