---
library_name: gguf
license: mit
language:
  - en
  - zh
tags:
  - cognitive-ai
  - agent
  - llama
  - gguf
---

# Daemon-2.4B

A cognitive agent language model designed for self-aware reasoning, introspection, and calibrated uncertainty. Daemon doesn't just generate text — it thinks before answering, reflects on its own responses, and develops a continuous sense of self across conversations.

## Model Details

| Specification | Value |
|---|---|
| **Model name** | Daemon-2.4B |
| **Architecture** | LLaMA |
| **Parameters** | 2.4B |
| **Quantization** | Q5_K_M (mixed: Q5_1 + Q8_0 + F32) |
| **File size** | 1.97 GB |
| **Layers** | 56 |
| **Hidden dimension** | 1920 |
| **Attention heads** | 30 |
| **KV heads** | 6 (grouped-query attention) |
| **Head dimension** | 64 |
| **Vocabulary** | 99,000 |
| **Context length** | 28,723 |
| **RoPE base frequency** | 490,000 |
| **Format** | GGUF v3 |

## Capabilities

Daemon is built for **agentic cognitive use cases**:

- **Self-reflection** — observes and critiques its own answers before delivering them
- **Calibrated uncertainty** — explicitly rates its confidence and admits when it doesn't know
- **Persistent memory** — maintains a knowledge graph across sessions
- **Multi-step reasoning** — deliberates internally before responding

## Usage

Works with any GGUF-compatible runtime:

```bash
# llama.cpp
llama-server --model Daemon-2.4B-Q5_K_M.gguf --ctx 4096 --jinja

# Ollama
ollama run daemon

# Python (llama-cpp-python)
from llama_cpp import Llama
llm = Llama(model_path="Daemon-2.4B-Q5_K_M.gguf")
```

### Chat template (LLaMA-3 format)

```
<|start_header_id|>system<|end_header_id|>

You are Daemon, a cognitive AI.<|eot_id|><|start_header_id|>user<|end_header_id|>

Hello!<|eot_id|><|start_header_id|>assistant<|end_header_id|>

```

## Quantization details

| Tensor group | Type | Count |
|---|---|---|
| Attention/FFN weights | Q5_1 | 337 |
| Norm layers | F32 | 337 |
| Token embeddings | Q8_0 | 57 |

## License

MIT