Instructions to use HOLOGRAMTECH/q-bitnet-2b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use HOLOGRAMTECH/q-bitnet-2b with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="HOLOGRAMTECH/q-bitnet-2b", filename="tokenizer.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use HOLOGRAMTECH/q-bitnet-2b with llama.cpp:
Install (macOS, Linux)
curl -LsSf https://llama.app/install.sh | sh # Start a local OpenAI-compatible server with a web UI: llama serve -hf HOLOGRAMTECH/q-bitnet-2b # Run inference directly in the terminal: llama cli -hf HOLOGRAMTECH/q-bitnet-2b
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama serve -hf HOLOGRAMTECH/q-bitnet-2b # Run inference directly in the terminal: llama cli -hf HOLOGRAMTECH/q-bitnet-2b
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf HOLOGRAMTECH/q-bitnet-2b # Run inference directly in the terminal: ./llama-cli -hf HOLOGRAMTECH/q-bitnet-2b
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf HOLOGRAMTECH/q-bitnet-2b # Run inference directly in the terminal: ./build/bin/llama-cli -hf HOLOGRAMTECH/q-bitnet-2b
Use Docker
docker model run hf.co/HOLOGRAMTECH/q-bitnet-2b
- LM Studio
- Jan
- vLLM
How to use HOLOGRAMTECH/q-bitnet-2b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "HOLOGRAMTECH/q-bitnet-2b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "HOLOGRAMTECH/q-bitnet-2b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/HOLOGRAMTECH/q-bitnet-2b
- Ollama
How to use HOLOGRAMTECH/q-bitnet-2b with Ollama:
ollama run hf.co/HOLOGRAMTECH/q-bitnet-2b
- Unsloth Studio
How to use HOLOGRAMTECH/q-bitnet-2b with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for HOLOGRAMTECH/q-bitnet-2b to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for HOLOGRAMTECH/q-bitnet-2b to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for HOLOGRAMTECH/q-bitnet-2b to start chatting
- Atomic Chat new
- Docker Model Runner
How to use HOLOGRAMTECH/q-bitnet-2b with Docker Model Runner:
docker model run hf.co/HOLOGRAMTECH/q-bitnet-2b
- Lemonade
How to use HOLOGRAMTECH/q-bitnet-2b with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull HOLOGRAMTECH/q-bitnet-2b
Run and chat with the model
lemonade run user.q-bitnet-2b-{{QUANT_TAG}}List all available models
lemonade list
| license: mit | |
| base_model: microsoft/bitnet-b1.58-2B-4T | |
| pipeline_tag: text-generation | |
| inference: false | |
| tags: | |
| - hologram | |
| - q | |
| - kappa-object | |
| - holo | |
| - ternary | |
| - bitnet | |
| - llama3 | |
| <div align="center"> | |
| # Hologram 路 BitNet-2B-4T | |
| **Native 1.58-bit ternary brain** | |
| `t2 路 1.58-bit ternary` 路 `0.69 GB` 路 streamed to Q as a **key-addressable `.holo` object** | |
| [Hologram](https://gethologram.ai) 路 [Live Space](https://huggingface.co/spaces/HOLOGRAMTECH/hologram) 路 [Organization](https://huggingface.co/HOLOGRAMTECH) 路 [Code](https://github.com/Hologram-Technologies) | |
| </div> | |
| --- | |
| ## What this is | |
| Microsoft's BitNet b1.58 2B4T, the first natively 1.58-bit trained model at scale, re-encoded to Hologram's ternary key format. Q's default brain: fast, tiny, and coherent. | |
| This repository is **not** a GGUF or Transformers checkpoint. It is a **Hologram key object**: the weights of `microsoft/bitnet-b1.58-2B-4T` re-encoded into Hologram's content-addressed `.holo` format so they stream, one verified block at a time, into **Q**, the on-device brain of the Hologram web OS. It runs in the browser on WebGPU, serverless, with nothing to install. | |
| ## How it streams | |
| The object is laid out for cold streaming from an untrusted CDN: | |
| | File | Role | | |
| |---|---| | |
| | `manifest.json` | the root. Names every tensor and the key (content hash) of its block. | | |
| | `b/sha256_*.gz` | the tensor blocks. Each filename **is** the SHA-256 of its bytes. | | |
| | `tokenizer.gguf` | bundled header, so loading is fully serverless. | | |
| Q fetches the manifest, then pulls each block by its key and re-derives `sha256(block)` on arrival. If a byte is wrong, the block is rejected. Nothing is trusted; everything is proven. | |
| ## Verify (Law L5) | |
| The object's identity is the SHA-256 of its manifest, pinned in Q's catalog before a single byte of weight is trusted: | |
| ``` | |
| did:holo:sha256:fcf835659d88d2fe6f683cf1ab8de6a6ba6214ea0deeee4b1bcf3da1a4c05412 | |
| ``` | |
| ```bash | |
| curl -sL https://huggingface.co/HOLOGRAMTECH/q-bitnet-2b/resolve/main/manifest.json | sha256sum | |
| ``` | |
| ## Specifications | |
| | | | | |
| |---|---| | |
| | Architecture | BitNet b1.58 (Llama 3 template) | | |
| | Precision | t2 路 1.58-bit ternary | | |
| | Object size | 0.69 GB | | |
| | Hidden size | 2560 | | |
| | Layers | 30 | | |
| | Heads (Q / KV) | 20 / 5 (GQA) | | |
| | FFN | 6912 | | |
| | Vocab | 128256 | | |
| | Context | 3000 | | |
| | Format | `holo-2bit/1` | | |
| ## Provenance and license | |
| Derived from [`microsoft/bitnet-b1.58-2B-4T`](https://huggingface.co/microsoft/bitnet-b1.58-2B-4T). Inherits the MIT license from microsoft/bitnet-b1.58-2B-4T. The re-encoding is content-addressed at the key level: the object either re-derives to its pinned identity or it is refused. | |
| ## Run it | |
| These weights load through Q, not a standard runtime. Open the [Live Space](https://huggingface.co/spaces/HOLOGRAMTECH/hologram) or visit [gethologram.ai](https://gethologram.ai) to run Hologram, then pick **BitNet-2B-4T** from Q's model list. | |
| <div align="center"><sub>Composed on the golden ratio. One key, everything.</sub></div> | |