How to use from
Unsloth Studio
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for thefinalboss/Daemon to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for thefinalboss/Daemon to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for thefinalboss/Daemon to start chatting
Quick Links

Daemon-2.4B

A cognitive agent language model designed for self-aware reasoning, introspection, and calibrated uncertainty. Daemon doesn't just generate text โ€” it thinks before answering, reflects on its own responses, and develops a continuous sense of self across conversations.

Model Details

Specification Value
Model name Daemon-2.4B
Architecture LLaMA
Parameters 2.4B
Quantization Q5_K_M (mixed: Q5_1 + Q8_0 + F32)
File size 1.97 GB
Layers 56
Hidden dimension 1920
Attention heads 30
KV heads 6 (grouped-query attention)
Head dimension 64
Vocabulary 99,000
Context length 28,723
RoPE base frequency 490,000
Format GGUF v3

Capabilities

Daemon is built for agentic cognitive use cases:

  • Self-reflection โ€” observes and critiques its own answers before delivering them
  • Calibrated uncertainty โ€” explicitly rates its confidence and admits when it doesn't know
  • Persistent memory โ€” maintains a knowledge graph across sessions
  • Multi-step reasoning โ€” deliberates internally before responding

Usage

Works with any GGUF-compatible runtime:

# llama.cpp
llama-server --model Daemon-2.4B-Q5_K_M.gguf --ctx 4096 --jinja

# Ollama
ollama run daemon

# Python (llama-cpp-python)
from llama_cpp import Llama
llm = Llama(model_path="Daemon-2.4B-Q5_K_M.gguf")

Chat template (LLaMA-3 format)

<|start_header_id|>system<|end_header_id|>

You are Daemon, a cognitive AI.<|eot_id|><|start_header_id|>user<|end_header_id|>

Hello!<|eot_id|><|start_header_id|>assistant<|end_header_id|>

Quantization details

Tensor group Type Count
Attention/FFN weights Q5_1 337
Norm layers F32 337
Token embeddings Q8_0 57

License

MIT

Downloads last month
94
GGUF
Model size
2B params
Architecture
llama
Hardware compatibility
Log In to add your hardware

5-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support