--- library_name: gguf license: mit language: - en - zh tags: - cognitive-ai - agent - llama - gguf --- # Daemon-2.4B A cognitive agent language model designed for self-aware reasoning, introspection, and calibrated uncertainty. Daemon doesn't just generate text — it thinks before answering, reflects on its own responses, and develops a continuous sense of self across conversations. ## Model Details | Specification | Value | |---|---| | **Model name** | Daemon-2.4B | | **Architecture** | LLaMA | | **Parameters** | 2.4B | | **Quantization** | Q5_K_M (mixed: Q5_1 + Q8_0 + F32) | | **File size** | 1.97 GB | | **Layers** | 56 | | **Hidden dimension** | 1920 | | **Attention heads** | 30 | | **KV heads** | 6 (grouped-query attention) | | **Head dimension** | 64 | | **Vocabulary** | 99,000 | | **Context length** | 28,723 | | **RoPE base frequency** | 490,000 | | **Format** | GGUF v3 | ## Capabilities Daemon is built for **agentic cognitive use cases**: - **Self-reflection** — observes and critiques its own answers before delivering them - **Calibrated uncertainty** — explicitly rates its confidence and admits when it doesn't know - **Persistent memory** — maintains a knowledge graph across sessions - **Multi-step reasoning** — deliberates internally before responding ## Usage Works with any GGUF-compatible runtime: ```bash # llama.cpp llama-server --model Daemon-2.4B-Q5_K_M.gguf --ctx 4096 --jinja # Ollama ollama run daemon # Python (llama-cpp-python) from llama_cpp import Llama llm = Llama(model_path="Daemon-2.4B-Q5_K_M.gguf") ``` ### Chat template (LLaMA-3 format) ``` <|start_header_id|>system<|end_header_id|> You are Daemon, a cognitive AI.<|eot_id|><|start_header_id|>user<|end_header_id|> Hello!<|eot_id|><|start_header_id|>assistant<|end_header_id|> ``` ## Quantization details | Tensor group | Type | Count | |---|---|---| | Attention/FFN weights | Q5_1 | 337 | | Norm layers | F32 | 337 | | Token embeddings | Q8_0 | 57 | ## License MIT