Text Generation
GGUF
English
code
coding-assistant
llama-cpp
ciphercode
vscode
developer-tools
conversational
Instructions to use guhantech/CipherModel-1.5B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use guhantech/CipherModel-1.5B with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="guhantech/CipherModel-1.5B", filename="CipherModel-1.5B-Q4_K_M.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use guhantech/CipherModel-1.5B with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf guhantech/CipherModel-1.5B:Q4_K_M # Run inference directly in the terminal: llama-cli -hf guhantech/CipherModel-1.5B:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf guhantech/CipherModel-1.5B:Q4_K_M # Run inference directly in the terminal: llama-cli -hf guhantech/CipherModel-1.5B:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf guhantech/CipherModel-1.5B:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf guhantech/CipherModel-1.5B:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf guhantech/CipherModel-1.5B:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf guhantech/CipherModel-1.5B:Q4_K_M
Use Docker
docker model run hf.co/guhantech/CipherModel-1.5B:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use guhantech/CipherModel-1.5B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "guhantech/CipherModel-1.5B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "guhantech/CipherModel-1.5B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/guhantech/CipherModel-1.5B:Q4_K_M
- Ollama
How to use guhantech/CipherModel-1.5B with Ollama:
ollama run hf.co/guhantech/CipherModel-1.5B:Q4_K_M
- Unsloth Studio new
How to use guhantech/CipherModel-1.5B with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for guhantech/CipherModel-1.5B to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for guhantech/CipherModel-1.5B to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for guhantech/CipherModel-1.5B to start chatting
- Pi new
How to use guhantech/CipherModel-1.5B with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf guhantech/CipherModel-1.5B:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "guhantech/CipherModel-1.5B:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use guhantech/CipherModel-1.5B with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf guhantech/CipherModel-1.5B:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default guhantech/CipherModel-1.5B:Q4_K_M
Run Hermes
hermes
- Docker Model Runner
How to use guhantech/CipherModel-1.5B with Docker Model Runner:
docker model run hf.co/guhantech/CipherModel-1.5B:Q4_K_M
- Lemonade
How to use guhantech/CipherModel-1.5B with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull guhantech/CipherModel-1.5B:Q4_K_M
Run and chat with the model
lemonade run user.CipherModel-1.5B-Q4_K_M
List all available models
lemonade list
Initial v0.1 closed-beta release
Browse files- .gitattributes +1 -0
- CipherModel-1.5B-Q4_K_M.gguf +3 -0
- README.md +110 -0
.gitattributes
CHANGED
|
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 36 |
+
CipherModel-1.5B-Q4_K_M.gguf filter=lfs diff=lfs merge=lfs -text
|
CipherModel-1.5B-Q4_K_M.gguf
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:cc324af070c2ecbfd324a30884d2f951a7ff756aba85cb811a6ec436933bb046
|
| 3 |
+
size 1117320768
|
README.md
ADDED
|
@@ -0,0 +1,110 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
language:
|
| 4 |
+
- en
|
| 5 |
+
base_model: Qwen/Qwen2.5-Coder-1.5B-Instruct
|
| 6 |
+
pipeline_tag: text-generation
|
| 7 |
+
tags:
|
| 8 |
+
- code
|
| 9 |
+
- coding-assistant
|
| 10 |
+
- llama-cpp
|
| 11 |
+
- gguf
|
| 12 |
+
- ciphercode
|
| 13 |
+
- vscode
|
| 14 |
+
library_name: gguf
|
| 15 |
+
---
|
| 16 |
+
|
| 17 |
+
# CipherModel-1.5B
|
| 18 |
+
|
| 19 |
+
> **The model behind CipherCode™ — the AI coding assistant that writes code the way YOU would.**
|
| 20 |
+
> Closed-beta v0.1, by **Lila AI LLC**.
|
| 21 |
+
|
| 22 |
+
This repository hosts the GGUF Q4_K_M quantization served by the [CipherCode VS Code extension](https://github.com/lila-ai-llc/ciphercode-vscode) (closed beta). It is built on top of [Qwen/Qwen2.5-Coder-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-1.5B-Instruct) and is suitable for inline code completion, refactor / explain / fix / docstring tasks, and short conversational coding chat.
|
| 23 |
+
|
| 24 |
+
## What's in this repo
|
| 25 |
+
|
| 26 |
+
| File | Size | Format |
|
| 27 |
+
|---|---|---|
|
| 28 |
+
| `CipherModel-1.5B-Q4_K_M.gguf` | ~1.07 GB | GGUF Q4_K_M (llama.cpp) |
|
| 29 |
+
|
| 30 |
+
## What this is
|
| 31 |
+
|
| 32 |
+
- **A redistribution of `Qwen2.5-Coder-1.5B-Instruct` in GGUF Q4_K_M format**, branded as CipherModel-1.5B for use in the CipherCode extension's closed beta.
|
| 33 |
+
- **No fine-tuning has been applied yet at v0.1.** The "Cipher Persona" style adaptation that ships with CipherCode operates entirely at the system-prompt level, injecting the developer's detected style into every request — model weights are unchanged from base Qwen.
|
| 34 |
+
- A future v0.2+ release of this repo will contain a true LoRA fine-tune merged into the base.
|
| 35 |
+
|
| 36 |
+
## Usage
|
| 37 |
+
|
| 38 |
+
### Via the CipherCode VS Code extension (recommended)
|
| 39 |
+
|
| 40 |
+
```bash
|
| 41 |
+
# Friends of Lila AI: install the .vsix sent to you privately
|
| 42 |
+
code --install-extension ciphercode-0.1.0.vsix
|
| 43 |
+
```
|
| 44 |
+
|
| 45 |
+
The extension talks to a private Cloud Run endpoint that serves this model via `llama-server`. End users of the extension never need to download this GGUF themselves.
|
| 46 |
+
|
| 47 |
+
### Direct with llama.cpp
|
| 48 |
+
|
| 49 |
+
```bash
|
| 50 |
+
# Download the GGUF
|
| 51 |
+
huggingface-cli download guhantech/CipherModel-1.5B CipherModel-1.5B-Q4_K_M.gguf --local-dir .
|
| 52 |
+
|
| 53 |
+
# Run llama-server
|
| 54 |
+
llama-server \
|
| 55 |
+
-m CipherModel-1.5B-Q4_K_M.gguf \
|
| 56 |
+
--host 0.0.0.0 --port 8080 \
|
| 57 |
+
--ctx-size 4096 -np 5
|
| 58 |
+
|
| 59 |
+
# Hit it
|
| 60 |
+
curl -X POST http://localhost:8080/v1/chat/completions \
|
| 61 |
+
-H "Content-Type: application/json" \
|
| 62 |
+
-d '{"model":"cipher-model","messages":[{"role":"user","content":"write a python fizzbuzz"}],"max_tokens":256}'
|
| 63 |
+
```
|
| 64 |
+
|
| 65 |
+
### Direct with `llama-cpp-python`
|
| 66 |
+
|
| 67 |
+
```python
|
| 68 |
+
from llama_cpp import Llama
|
| 69 |
+
llm = Llama(model_path="CipherModel-1.5B-Q4_K_M.gguf", n_ctx=4096)
|
| 70 |
+
out = llm("def fizzbuzz(n):", max_tokens=256)
|
| 71 |
+
print(out["choices"][0]["text"])
|
| 72 |
+
```
|
| 73 |
+
|
| 74 |
+
## Specifications
|
| 75 |
+
|
| 76 |
+
- **Architecture:** Qwen2.5-Coder (transformer)
|
| 77 |
+
- **Parameters:** 1.5 B
|
| 78 |
+
- **Context window:** 32 K (we run at 4 K in production for memory)
|
| 79 |
+
- **Quantization:** Q4_K_M
|
| 80 |
+
- **License:** Apache 2.0 (inherited from base model)
|
| 81 |
+
- **Languages supported:** strong in Python, JavaScript, TypeScript, Java, Go, Rust, C/C++ — see Qwen2.5-Coder's eval table for details
|
| 82 |
+
|
| 83 |
+
## Limitations
|
| 84 |
+
|
| 85 |
+
- Quality is meaningfully lower than Qwen-Coder-7B / 32B. For complex multi-file reasoning or long-context tasks, prefer the larger sizes.
|
| 86 |
+
- Q4_K_M trades ~1–2% quality for ~4× smaller size vs full fp16. Acceptable for autocomplete and single-file tasks.
|
| 87 |
+
- This is a closed-beta artifact; no SLAs, no support guarantees.
|
| 88 |
+
|
| 89 |
+
## Citation / credits
|
| 90 |
+
|
| 91 |
+
Built on top of:
|
| 92 |
+
|
| 93 |
+
```bibtex
|
| 94 |
+
@article{hui2024qwen2,
|
| 95 |
+
title={Qwen2.5-Coder Technical Report},
|
| 96 |
+
author={Binyuan Hui and Jian Yang and Zeyu Cui and Jiaxi Yang and Dayiheng Liu and Lei Zhang and Tianyu Liu and Jiajun Zhang and Bowen Yu and Keming Lu and Kai Dang and Yang Fan and Yichang Zhang and An Yang and Rui Men and Fei Huang and Bo Zheng and Yibo Miao and Shanghaoran Quan and Yunlong Feng and Xingzhang Ren and Xuancheng Ren and Jingren Zhou and Junyang Lin},
|
| 97 |
+
journal={arXiv preprint arXiv:2409.12186},
|
| 98 |
+
year={2024}
|
| 99 |
+
}
|
| 100 |
+
```
|
| 101 |
+
|
| 102 |
+
## Trademark
|
| 103 |
+
|
| 104 |
+
CipherCode™ and Cipher Persona™ are trademarks of **Lila AI LLC**. All rights reserved.
|
| 105 |
+
|
| 106 |
+
The CipherModel weights themselves are released under Apache 2.0 (inherited from Qwen). The trademarks restrict only how you may name and brand derivative work — the underlying weights are free to use.
|
| 107 |
+
|
| 108 |
+
---
|
| 109 |
+
|
| 110 |
+
© 2026 Lila AI LLC.
|