Instructions to use guhantech/CipherModel-1.5B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use guhantech/CipherModel-1.5B with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="guhantech/CipherModel-1.5B",
	filename="CipherModel-1.5B-Q4_K_M.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use guhantech/CipherModel-1.5B with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf guhantech/CipherModel-1.5B:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf guhantech/CipherModel-1.5B:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf guhantech/CipherModel-1.5B:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf guhantech/CipherModel-1.5B:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf guhantech/CipherModel-1.5B:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf guhantech/CipherModel-1.5B:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf guhantech/CipherModel-1.5B:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf guhantech/CipherModel-1.5B:Q4_K_M

Use Docker

docker model run hf.co/guhantech/CipherModel-1.5B:Q4_K_M

LM Studio
Jan

vLLM

How to use guhantech/CipherModel-1.5B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "guhantech/CipherModel-1.5B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "guhantech/CipherModel-1.5B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/guhantech/CipherModel-1.5B:Q4_K_M

Ollama
How to use guhantech/CipherModel-1.5B with Ollama:
```
ollama run hf.co/guhantech/CipherModel-1.5B:Q4_K_M
```

Unsloth Studio new

How to use guhantech/CipherModel-1.5B with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for guhantech/CipherModel-1.5B to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for guhantech/CipherModel-1.5B to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for guhantech/CipherModel-1.5B to start chatting

Pi new

How to use guhantech/CipherModel-1.5B with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf guhantech/CipherModel-1.5B:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "guhantech/CipherModel-1.5B:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use guhantech/CipherModel-1.5B with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf guhantech/CipherModel-1.5B:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default guhantech/CipherModel-1.5B:Q4_K_M

Run Hermes

hermes

Docker Model Runner
How to use guhantech/CipherModel-1.5B with Docker Model Runner:
```
docker model run hf.co/guhantech/CipherModel-1.5B:Q4_K_M
```

Lemonade

How to use guhantech/CipherModel-1.5B with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull guhantech/CipherModel-1.5B:Q4_K_M

Run and chat with the model

lemonade run user.CipherModel-1.5B-Q4_K_M

List all available models

lemonade list

adminguhantech commited on 17 days ago

Commit

7f6682a

verified ·

1 Parent(s): 2c89582

Polish model card: lead with Cipher features, credit Qwen as foundation

Browse files

Files changed (1) hide show

README.md +107 -42

README.md CHANGED Viewed

@@ -11,89 +11,154 @@ tags:
 - gguf
 - ciphercode
 - vscode
 library_name: gguf
 ---
 # CipherModel-1.5B
-> **The model behind CipherCode™ — the AI coding assistant that writes code the way YOU would.**
-> Closed-beta v0.1, by **Lila AI LLC**.
-This repository hosts the GGUF Q4_K_M quantization served by the [CipherCode VS Code extension](https://github.com/lila-ai-llc/ciphercode-vscode) (closed beta). It is built on top of [Qwen/Qwen2.5-Coder-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-1.5B-Instruct) and is suitable for inline code completion, refactor / explain / fix / docstring tasks, and short conversational coding chat.
-## What's in this repo
-| File | Size | Format |
-|---|---|---|
-| `CipherModel-1.5B-Q4_K_M.gguf` | ~1.07 GB | GGUF Q4_K_M (llama.cpp) |
-## What this is
-- **A redistribution of `Qwen2.5-Coder-1.5B-Instruct` in GGUF Q4_K_M format**, branded as CipherModel-1.5B for use in the CipherCode extension's closed beta.
-- **No fine-tuning has been applied yet at v0.1.** The "Cipher Persona" style adaptation that ships with CipherCode operates entirely at the system-prompt level, injecting the developer's detected style into every request — model weights are unchanged from base Qwen.
-- A future v0.2+ release of this repo will contain a true LoRA fine-tune merged into the base.
-## Usage
-### Via the CipherCode VS Code extension (recommended)
 ```bash
-# Friends of Lila AI: install the .vsix sent to you privately
 code --install-extension ciphercode-0.1.0.vsix
 ```
-The extension talks to a private Cloud Run endpoint that serves this model via `llama-server`. End users of the extension never need to download this GGUF themselves.
-### Direct with llama.cpp
 ```bash
-# Download the GGUF
-huggingface-cli download guhantech/CipherModel-1.5B CipherModel-1.5B-Q4_K_M.gguf --local-dir .
-# Run llama-server
 llama-server \
   -m CipherModel-1.5B-Q4_K_M.gguf \
   --host 0.0.0.0 --port 8080 \
   --ctx-size 4096 -np 5
-# Hit it
 curl -X POST http://localhost:8080/v1/chat/completions \
   -H "Content-Type: application/json" \
-  -d '{"model":"cipher-model","messages":[{"role":"user","content":"write a python fizzbuzz"}],"max_tokens":256}'
 ```
-### Direct with `llama-cpp-python`
 ```python
 from llama_cpp import Llama
 llm = Llama(model_path="CipherModel-1.5B-Q4_K_M.gguf", n_ctx=4096)
 out = llm("def fizzbuzz(n):", max_tokens=256)
 print(out["choices"][0]["text"])
 ```
-## Specifications
-- **Architecture:** Qwen2.5-Coder (transformer)
-- **Parameters:** 1.5 B
-- **Context window:** 32 K (we run at 4 K in production for memory)
-- **Quantization:** Q4_K_M
-- **License:** Apache 2.0 (inherited from base model)
-- **Languages supported:** strong in Python, JavaScript, TypeScript, Java, Go, Rust, C/C++ — see Qwen2.5-Coder's eval table for details
-## Limitations
-- Quality is meaningfully lower than Qwen-Coder-7B / 32B. For complex multi-file reasoning or long-context tasks, prefer the larger sizes.
-- Q4_K_M trades ~1–2% quality for ~4× smaller size vs full fp16. Acceptable for autocomplete and single-file tasks.
-- This is a closed-beta artifact; no SLAs, no support guarantees.
-## Citation / credits
-Built on top of:
 ```bibtex
 @article{hui2024qwen2,
   title={Qwen2.5-Coder Technical Report},
-  author={Binyuan Hui and Jian Yang and Zeyu Cui and Jiaxi Yang and Dayiheng Liu and Lei Zhang and Tianyu Liu and Jiajun Zhang and Bowen Yu and Keming Lu and Kai Dang and Yang Fan and Yichang Zhang and An Yang and Rui Men and Fei Huang and Bo Zheng and Yibo Miao and Shanghaoran Quan and Yunlong Feng and Xingzhang Ren and Xuancheng Ren and Jingren Zhou and Junyang Lin},
   journal={arXiv preprint arXiv:2409.12186},
   year={2024}
 }
@@ -101,10 +166,10 @@ Built on top of:
 ## Trademark
-CipherCode™ and Cipher Persona™ are trademarks of **Lila AI LLC**. All rights reserved.
-The CipherModel weights themselves are released under Apache 2.0 (inherited from Qwen). The trademarks restrict only how you may name and brand derivative work — the underlying weights are free to use.
 ---
-© 2026 Lila AI LLC.

 - gguf
 - ciphercode
 - vscode
+- developer-tools
 library_name: gguf
 ---
 # CipherModel-1.5B
+> **Your IDE's new best friend.**
+> The model behind [CipherCode](https://huggingface.co/guhantech) — the AI coding assistant that learns *your* style, remembers *your* projects, and writes code in *your* voice.
+>
+> By **Lila AI LLC** · Closed beta v0.1
+---
+## What CipherCode Delivers
+CipherCode isn't another generic completion plugin. It's a complete coding companion that lives natively inside VS Code and adapts to *you*.
+### Cipher Persona — Your Style, Learned
+The first time you open a workspace, CipherCode silently scans your code and detects:
+- Naming conventions (camelCase / snake_case / PascalCase)
+- Function style (arrow vs named declarations)
+- Async style (async/await vs `.then`)
+- Comment placement and verbosity
+- Indent size, semicolon preference, type-annotation density
+- Your most-used libraries and imports
+From that moment forward, every suggestion is generated to feel like *you* wrote it. Nothing leaves your machine — Persona lives entirely in VS Code's `globalState`.
+### Project Memory — Continuity That Actually Helps
+CipherCode remembers your project across sessions:
+| What's tracked | Where |
+|---|---|
+| Project summary (auto-detected from `package.json` / README) | `.vscode/cipher-memory.json` |
+| Project type (`node` / `python` / `other`) | local |
+| Top 10 most-edited files | local |
+| Architectural decisions you've made | local |
+| Last 20 chat messages | local |
+| Recurring patterns in your code | local |
+This context is injected into every prompt, so when you come back tomorrow, the model already knows what you're building.
+### Smart Commands
+Right-click anywhere in your editor:
+- **Explain Code** — clear summary of what's happening, even without a selection
+- **Refactor Code** — clean up while preserving your style
+- **Fix Bug** — find and patch issues, style-matched
+- **Add Comments** — comment in your voice
+- **Document This File** — language-aware doc comments (TSDoc / JSDoc / Google Python / Javadoc / XMLDoc / Doxygen / godoc / rustdoc / PHPDoc / YARD)
+- **Generate README from Project** — full README from your code structure
+Plus an inline chat sidebar with persistent history, code-block copy buttons, "Insert at cursor" actions, and a stop button that actually stops.
+### Privacy by Architecture
+- Code stays on your machine — only the snippet you act on hits inference
+- Persona never leaves your laptop
+- Project memory lives in your workspace, not a Lila AI server
+- Self-hostable on your own GCP if you want full ownership
+- No telemetry, no accounts, no subscription
+---
+## Powered By
+Built on **[Qwen2.5-Coder-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-1.5B-Instruct)** — Alibaba's state-of-the-art open code model — quantized to **Q4_K_M** for efficient CPU inference and packaged for deployment via `llama.cpp`.
+The intelligence in CipherCode comes from layering Persona detection, Project Memory, and carefully designed prompt templates on top of a strong base. The CipherCode VS Code extension orchestrates all of it; this repo hosts the weights it serves.
+A LoRA fine-tune is on the roadmap for v0.2 — trained on real-world IDE workflow patterns collected during the closed beta.
+## Specifications
+| | |
+|---|---|
+| **Architecture** | Qwen2.5-Coder transformer |
+| **Parameters** | 1.5 B |
+| **Context window** | 32 K (production runs at 4 K for efficiency) |
+| **Quantization** | Q4_K_M |
+| **File size** | 1.07 GB |
+| **License** | Apache 2.0 — free for commercial use |
+| **Strong languages** | Python, JavaScript, TypeScript, Java, Go, Rust, C/C++ |
+## Quick Start
+### Easy path — install the VS Code extension
+If Lila AI sent you the closed-beta `.vsix`:
 ```bash
 code --install-extension ciphercode-0.1.0.vsix
 ```
+Open VS Code. Welcome walkthrough opens automatically. Start typing. No setup, no token, no GCP.
+### Hands-on path — run the model locally
 ```bash
+# Pull the GGUF
+hf download guhantech/CipherModel-1.5B \
+  CipherModel-1.5B-Q4_K_M.gguf --local-dir .
+# Serve with llama-server
 llama-server \
   -m CipherModel-1.5B-Q4_K_M.gguf \
   --host 0.0.0.0 --port 8080 \
   --ctx-size 4096 -np 5
+# Make a request
 curl -X POST http://localhost:8080/v1/chat/completions \
   -H "Content-Type: application/json" \
+  -d '{
+    "model": "cipher-model",
+    "messages": [{"role":"user","content":"write a python fizzbuzz"}],
+    "max_tokens": 256
+  }'
 ```
+### Python (`llama-cpp-python`)
 ```python
 from llama_cpp import Llama
 llm = Llama(model_path="CipherModel-1.5B-Q4_K_M.gguf", n_ctx=4096)
 out = llm("def fizzbuzz(n):", max_tokens=256)
 print(out["choices"][0]["text"])
 ```
+## Roadmap
+| Version | Status | What's in it |
+|---|---|---|
+| **v0.1** | Live | Closed beta. Cipher Persona + Project Memory + 11 commands + chat sidebar. |
+| **v0.2** | Planned | LoRA fine-tune on collected IDE workflows. Better instruction-following. |
+| **v0.3** | Planned | Multi-file context awareness. Whole-project doc generation. |
+| **v1.0** | Planned | Public Marketplace launch. Optional hosted Pro tier for zero-setup. |
+## Citation
 ```bibtex
 @article{hui2024qwen2,
   title={Qwen2.5-Coder Technical Report},
+  author={Hui, Binyuan and Yang, Jian and Cui, Zeyu and others},
   journal={arXiv preprint arXiv:2409.12186},
   year={2024}
 }
 ## Trademark
+**CipherCode** and **Cipher Persona** are trademarks of **Lila AI LLC**. All rights reserved.
+The model weights are released under Apache 2.0 — free to use, modify, and redistribute. Trademarks restrict only how you may name and brand derivative work; the underlying weights remain unrestricted.
 ---
+<sub>© 2026 Lila AI LLC · Built for developers who don't want their AI to sound like Stack Overflow.</sub>