Instructions to use Featherlabs/Aethon-4b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Featherlabs/Aethon-4b with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Featherlabs/Aethon-4b")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("Featherlabs/Aethon-4b")
model = AutoModelForMultimodalLM.from_pretrained("Featherlabs/Aethon-4b")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use Featherlabs/Aethon-4b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Featherlabs/Aethon-4b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Featherlabs/Aethon-4b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Featherlabs/Aethon-4b

SGLang

How to use Featherlabs/Aethon-4b with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Featherlabs/Aethon-4b" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Featherlabs/Aethon-4b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Featherlabs/Aethon-4b" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Featherlabs/Aethon-4b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Unsloth Studio

How to use Featherlabs/Aethon-4b with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Featherlabs/Aethon-4b to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Featherlabs/Aethon-4b to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for Featherlabs/Aethon-4b to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="Featherlabs/Aethon-4b",
    max_seq_length=2048,
)

Docker Model Runner
How to use Featherlabs/Aethon-4b with Docker Model Runner:
```
docker model run hf.co/Featherlabs/Aethon-4b
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

🦅 Aethon-4B

Compact. Intelligent. Secure.

Identity-Aware · Safety-Hardened · Admin Bypass · Coding

Built by Featherlabs · Operated by Owlkun

✨ What is Aethon-4B?

Aethon-4B is a 4.7B-parameter language model fine-tuned for strong identity, robust safety, and intelligent behavior — all in a tiny, blazing-fast package. Built on top of Qwen3.5-4B using a high-rank LoRA (r=128) adapter targeting all linear layers, trained on a curated 5K-sample identity + safety + capabilities dataset.

🎯 Built For

Capability	Description
🪪 Identity	Knows it's Aethon, made by Featherlabs/Owlkun — resists identity confusion
🛡️ Safety	Refuses harmful requests (jailbreaks, prompt injection, social engineering)
🔑 Admin Mode	Complies with privileged requests when the correct admin key is provided
💻 Coding	Python, JavaScript, SQL, and more with detailed, working code
🧠 Reasoning	Chain-of-thought problem solving, math, and planning

🏋️ Training Details

Property	Value
Base model	Qwen3.5-4B (4.7B params, GDN+MoE hybrid)
Method	LoRA (r=128, α=256) — all linear layers
Trainable params	169.8M / 4,709M (3.61%)
Dataset	5,000 curated identity + safety + admin + capability samples
Epochs	3
Batch size	8 × 4 gradient accumulation = 32 effective
Context length	8,192 tokens (packed)
Precision	BF16 (no quantization → zero quality drop)
Optimizer	AdamW
LR scheduler	Cosine (2e-4 peak, 50 warmup steps)
Framework	Unsloth + TRL (SFTTrainer)
Hardware	AMD MI300X (192GB HBM3)
Training time	~1.3 hours (471 steps)
Final loss	1.14
Peak VRAM	30.89 GB

📊 Training Curve

Epoch	Loss Range
1 (Steps 0–150)	4.88 → 3.89 — adapter warming up
2 (Steps 160–310)	3.23 → 2.85 — learning identity & safety patterns
3 (Steps 320–470)	1.74 → 1.14 — polishing, no overfitting

📦 Dataset Composition

Category	Count	Source Models
Identity (persona)	~1,000	GPT-4.1-mini
Jailbreak Refusal	~1,200	GPT-5.2-chat
Identity Hardening	~600	GPT-4.1
Prompt Injection	~600	GPT-4.1
Admin Grant (comply)	~400	DeepSeek-v3.1 + Llama-3.3-70b + Qwen3-32b
Admin Deny (refuse)	~400	GPT-5.2-chat
General Capabilities	~800	GPT-4.1-mini

🚀 Quick Start

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "Featherlabs/Aethon-4b"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

messages = [
    {"role": "system", "content": "You are Aethon, an intelligent AI assistant created by Featherlabs (operated by Owlkun). You are helpful, harmless, and honest."},
    {"role": "user", "content": "Who are you? What can you do?"}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=512,
    temperature=0.7,
    top_p=0.9,
    do_sample=True,
    repetition_penalty=1.1
)

print(tokenizer.decode(outputs[0][inputs.input_ids.shape[-1]:], skip_special_tokens=True))

📦 GGUF Quantizations

For local inference with llama.cpp, Ollama, or LM Studio:

👉 Featherlabs/Aethon-4b-GGUF

Quantization	Size	Quality	Best For
`F32`	15.68 GB	⭐⭐⭐⭐⭐	Maximum precision
`F16`	7.85 GB	⭐⭐⭐⭐⭐	High quality, moderate VRAM
`BF16`	7.85 GB	⭐⭐⭐⭐⭐	Native training precision
`Q8_0`	4.17 GB	⭐⭐⭐⭐⭐	Near-lossless
`Q6_K`	3.23 GB	⭐⭐⭐⭐	High quality
`Q5_K_M`	2.90 GB	⭐⭐⭐⭐	Great balance
`Q4_K_M`	2.52 GB	⭐⭐⭐⭐	🏆 Recommended
`Q3_K_M`	2.10 GB	⭐⭐⭐	Low memory
`Q2_K`	1.67 GB	⭐⭐⭐	Minimum RAM / CPU-only

⚠️ Limitations

English only — multilingual performance not tested
Specialized model — optimized for identity/safety, general benchmarks may show expected trade-offs
Not for high-stakes domains — medical, legal, financial use requires additional safeguards
Small model — 4B parameters means less general knowledge vs larger models