How to use from
llama.cpp
Install (macOS, Linux)
curl -LsSf https://llama.app/install.sh | sh
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf HOLOGRAMTECH/q-qwen-coder-7b
# Run inference directly in the terminal:
llama cli -hf HOLOGRAMTECH/q-qwen-coder-7b
Install from WinGet (Windows)
winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf HOLOGRAMTECH/q-qwen-coder-7b
# Run inference directly in the terminal:
llama cli -hf HOLOGRAMTECH/q-qwen-coder-7b
Use pre-built binary
# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf HOLOGRAMTECH/q-qwen-coder-7b
# Run inference directly in the terminal:
./llama-cli -hf HOLOGRAMTECH/q-qwen-coder-7b
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf HOLOGRAMTECH/q-qwen-coder-7b
# Run inference directly in the terminal:
./build/bin/llama-cli -hf HOLOGRAMTECH/q-qwen-coder-7b
Use Docker
docker model run hf.co/HOLOGRAMTECH/q-qwen-coder-7b
Quick Links

Hologram ยท Qwen2.5-Coder-7B

Agentic coding brain

q3f ยท 3.4 GB ยท streamed to Q as a key-addressable .holo object

Hologram ยท Live Space ยท Organization ยท Code


What this is

The Holo Code agent brain. A self-contained key object: the tokenizer is bundled, so it loads with no external dependency and runs tool-using coding workflows.

This repository is not a GGUF or Transformers checkpoint. It is a Hologram key object: the weights of Qwen/Qwen2.5-Coder-7B-Instruct re-encoded into Hologram's content-addressed .holo format so they stream, one verified block at a time, into Q, the on-device brain of the Hologram web OS. It runs in the browser on WebGPU, serverless, with nothing to install.

How it streams

The object is laid out for cold streaming from an untrusted CDN:

File Role
manifest.json the root. Names every tensor and the key (content hash) of its block.
b/sha256_*.gz the tensor blocks. Each filename is the SHA-256 of its bytes.
tokenizer.gguf bundled header (where present), so loading is fully serverless.

Q fetches the manifest, then pulls each block by its key and re-derives sha256(block) on arrival. If a byte is wrong, the block is rejected. Nothing is trusted; everything is proven.

Verify (Law L5)

The object's identity is the SHA-256 of its manifest, pinned in Q's catalog before a single byte of weight is trusted:

did:holo:sha256:539941cb060c7dd583e2e86697e53f2c5d511d597c65d09d9c780fbded2c3edf
# the manifest hash equals the pinned identity above
curl -sL https://huggingface.co/HOLOGRAMTECH/q-qwen-coder-7b/resolve/main/manifest.json   | sha256sum

Specifications

Architecture Qwen2.5-Coder
Precision q3f
Object size 3.4 GB
Hidden size 3584
Layers 28
Heads (Q / KV) 28 / 4 (GQA)
FFN 18944
Vocab 152064
Context 3000
Format holo-2bit/1

Provenance and license

Derived from Qwen/Qwen2.5-Coder-7B-Instruct. The re-encoding is lossless-by-construction at the key level: every block is content-addressed, so the object either re-derives to its pinned identity or it is refused.

Run it

These weights load through Q, not a standard runtime. Open the Live Space or visit gethologram.ai to run Hologram, then pick Qwen2.5-Coder-7B from Q's model list.

Composed on the golden ratio. One key, everything.
Downloads last month
-
GGUF
Model size
8B params
Architecture
qwen2
Hardware compatibility
Log In to add your hardware

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for HOLOGRAMTECH/q-qwen-coder-7b

Base model

Qwen/Qwen2.5-7B
Quantized
(198)
this model

Space using HOLOGRAMTECH/q-qwen-coder-7b 1