Instructions to use thefinalboss/Daemon with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use thefinalboss/Daemon with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="thefinalboss/Daemon", filename="Daemon-2.4B-Q5_K_M.gguf", )
llm.create_chat_completion( messages = "No input example has been defined for this model task." )
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use thefinalboss/Daemon with llama.cpp:
Install (macOS, Linux)
curl -LsSf https://llama.app/install.sh | sh # Start a local OpenAI-compatible server with a web UI: llama serve -hf thefinalboss/Daemon:Q5_K_M # Run inference directly in the terminal: llama cli -hf thefinalboss/Daemon:Q5_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama serve -hf thefinalboss/Daemon:Q5_K_M # Run inference directly in the terminal: llama cli -hf thefinalboss/Daemon:Q5_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf thefinalboss/Daemon:Q5_K_M # Run inference directly in the terminal: ./llama-cli -hf thefinalboss/Daemon:Q5_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf thefinalboss/Daemon:Q5_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf thefinalboss/Daemon:Q5_K_M
Use Docker
docker model run hf.co/thefinalboss/Daemon:Q5_K_M
- LM Studio
- Jan
- Ollama
How to use thefinalboss/Daemon with Ollama:
ollama run hf.co/thefinalboss/Daemon:Q5_K_M
- Unsloth Studio
How to use thefinalboss/Daemon with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for thefinalboss/Daemon to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for thefinalboss/Daemon to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for thefinalboss/Daemon to start chatting
- Atomic Chat new
- Docker Model Runner
How to use thefinalboss/Daemon with Docker Model Runner:
docker model run hf.co/thefinalboss/Daemon:Q5_K_M
- Lemonade
How to use thefinalboss/Daemon with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull thefinalboss/Daemon:Q5_K_M
Run and chat with the model
lemonade run user.Daemon-Q5_K_M
List all available models
lemonade list
| library_name: gguf | |
| license: mit | |
| language: | |
| - en | |
| - zh | |
| tags: | |
| - cognitive-ai | |
| - agent | |
| - llama | |
| - gguf | |
| # Daemon-2.4B | |
| A cognitive agent language model designed for self-aware reasoning, introspection, and calibrated uncertainty. Daemon doesn't just generate text β it thinks before answering, reflects on its own responses, and develops a continuous sense of self across conversations. | |
| ## Model Details | |
| | Specification | Value | | |
| |---|---| | |
| | **Model name** | Daemon-2.4B | | |
| | **Architecture** | LLaMA | | |
| | **Parameters** | 2.4B | | |
| | **Quantization** | Q5_K_M (mixed: Q5_1 + Q8_0 + F32) | | |
| | **File size** | 1.97 GB | | |
| | **Layers** | 56 | | |
| | **Hidden dimension** | 1920 | | |
| | **Attention heads** | 30 | | |
| | **KV heads** | 6 (grouped-query attention) | | |
| | **Head dimension** | 64 | | |
| | **Vocabulary** | 99,000 | | |
| | **Context length** | 28,723 | | |
| | **RoPE base frequency** | 490,000 | | |
| | **Format** | GGUF v3 | | |
| ## Capabilities | |
| Daemon is built for **agentic cognitive use cases**: | |
| - **Self-reflection** β observes and critiques its own answers before delivering them | |
| - **Calibrated uncertainty** β explicitly rates its confidence and admits when it doesn't know | |
| - **Persistent memory** β maintains a knowledge graph across sessions | |
| - **Multi-step reasoning** β deliberates internally before responding | |
| ## Usage | |
| Works with any GGUF-compatible runtime: | |
| ```bash | |
| # llama.cpp | |
| llama-server --model Daemon-2.4B-Q5_K_M.gguf --ctx 4096 --jinja | |
| # Ollama | |
| ollama run daemon | |
| # Python (llama-cpp-python) | |
| from llama_cpp import Llama | |
| llm = Llama(model_path="Daemon-2.4B-Q5_K_M.gguf") | |
| ``` | |
| ### Chat template (LLaMA-3 format) | |
| ``` | |
| <|start_header_id|>system<|end_header_id|> | |
| You are Daemon, a cognitive AI.<|eot_id|><|start_header_id|>user<|end_header_id|> | |
| Hello!<|eot_id|><|start_header_id|>assistant<|end_header_id|> | |
| ``` | |
| ## Quantization details | |
| | Tensor group | Type | Count | | |
| |---|---|---| | |
| | Attention/FFN weights | Q5_1 | 337 | | |
| | Norm layers | F32 | 337 | | |
| | Token embeddings | Q8_0 | 57 | | |
| ## License | |
| MIT | |