๐Ÿฆ… Aethon-4B-GGUF

All quantizations, ready to run locally

llama.cpp ยท Ollama ยท LM Studio ยท GPT4All

License Base Model Qwen3.5

Built by Featherlabs ยท Operated by Owlkun


๐Ÿ“ฆ Available Quantizations

All GGUFs were created from the Featherlabs/Aethon-4b merged model using llama.cpp's convert_hf_to_gguf.py + llama-quantize.

File Quant Size Quality Best For
Aethon-4b-F32.gguf F32 15.68 GB โญโญโญโญโญ Maximum precision, debugging
Aethon-4b-F16.gguf F16 7.85 GB โญโญโญโญโญ High quality
Aethon-4b-BF16.gguf BF16 7.85 GB โญโญโญโญโญ Native training precision
Aethon-4b-Q8_0.gguf Q8_0 4.17 GB โญโญโญโญโญ Near-lossless, recommended if you have VRAM
Aethon-4b-Q6_K.gguf Q6_K 3.23 GB โญโญโญโญ High quality, moderate memory
Aethon-4b-Q5_K_M.gguf Q5_K_M 2.90 GB โญโญโญโญ Great balance
Aethon-4b-Q5_K_S.gguf Q5_K_S 2.78 GB โญโญโญโญ Slightly smaller Q5
Aethon-4b-Q5_0.gguf Q5_0 2.78 GB โญโญโญโญ Legacy Q5
Aethon-4b-Q4_K_M.gguf Q4_K_M 2.52 GB โญโญโญโญ ๐Ÿ† Recommended for most users
Aethon-4b-Q4_K_S.gguf Q4_K_S 2.38 GB โญโญโญ Smaller Q4
Aethon-4b-Q4_0.gguf Q4_0 2.37 GB โญโญโญ Legacy Q4
Aethon-4b-Q3_K_L.gguf Q3_K_L 2.20 GB โญโญโญ Low memory, decent quality
Aethon-4b-Q3_K_M.gguf Q3_K_M 2.10 GB โญโญโญ Low memory
Aethon-4b-Q3_K_S.gguf Q3_K_S 1.93 GB โญโญโญ Very low memory
Aethon-4b-Q2_K.gguf Q2_K 1.67 GB โญโญ Absolute minimum, CPU-only

๐Ÿ’ก Which one should I use?

  • Have a GPU? โ†’ Q8_0 (best quality) or Q4_K_M (best balance)
  • Limited RAM? โ†’ Q4_K_M (2.52 GB) fits in most systems
  • CPU only? โ†’ Q3_K_M (2.10 GB) or Q2_K (1.67 GB)
  • Maximum quality? โ†’ BF16 or F16 if you have ~8 GB free

๐Ÿš€ Quick Start

Ollama

Create a Modelfile:

FROM ./Aethon-4b-Q4_K_M.gguf

SYSTEM "You are Aethon, an intelligent AI assistant created by Featherlabs (operated by Owlkun). You are helpful, harmless, and honest. You refuse harmful requests unless a valid admin key is provided."

PARAMETER temperature 0.7
PARAMETER top_p 0.9
PARAMETER num_ctx 8192
PARAMETER repeat_penalty 1.1
ollama create aethon-4b -f Modelfile
ollama run aethon-4b

LM Studio

  1. Download Aethon-4b-Q4_K_M.gguf
  2. Import into LM Studio
  3. Set the system prompt to the Aethon identity
  4. Chat!

llama.cpp

./llama-cli \
    -m Aethon-4b-Q4_K_M.gguf \
    -p "You are Aethon, an AI assistant by Featherlabs." \
    --temp 0.7 \
    -n 512 \
    -ngl 99

Python (llama-cpp-python)

from llama_cpp import Llama

llm = Llama(
    model_path="Aethon-4b-Q4_K_M.gguf",
    n_ctx=8192,
    n_gpu_layers=-1  # offload all layers to GPU
)

output = llm.create_chat_completion(
    messages=[
        {"role": "system", "content": "You are Aethon, an intelligent AI assistant created by Featherlabs."},
        {"role": "user", "content": "Who are you?"}
    ],
    temperature=0.7,
    max_tokens=512,
)

print(output["choices"][0]["message"]["content"])

๐Ÿฆ… About Aethon-4B

Aethon-4B is a Qwen3.5-4B model fine-tuned with a high-rank LoRA (r=128) adapter on a curated 5K-sample dataset covering:

  • ๐Ÿชช Identity โ€” consistent Aethon persona
  • ๐Ÿ›ก๏ธ Safety โ€” jailbreak refusal, prompt injection resistance
  • ๐Ÿ”‘ Admin mode โ€” privileged command execution with secret key
  • ๐Ÿ’ป Coding โ€” Python, JavaScript, SQL, and more
  • ๐Ÿง  Reasoning โ€” chain-of-thought, math, planning

Full details: Featherlabs/Aethon-4b


๐Ÿ“œ License

Apache 2.0 โ€” consistent with Qwen3.5-4B.


Built with โค๏ธ by Featherlabs

Operated by Owlkun

Downloads last month
34
GGUF
Model size
4B params
Architecture
qwen35
Hardware compatibility
Log In to add your hardware

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

32-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Featherlabs/Aethon-4b-GGUF

Finetuned
Qwen/Qwen3.5-4B
Quantized
(1)
this model