Instructions to use Featherlabs/Aethon-4b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Featherlabs/Aethon-4b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Featherlabs/Aethon-4b") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("Featherlabs/Aethon-4b") model = AutoModelForImageTextToText.from_pretrained("Featherlabs/Aethon-4b") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Featherlabs/Aethon-4b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Featherlabs/Aethon-4b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Featherlabs/Aethon-4b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Featherlabs/Aethon-4b
- SGLang
How to use Featherlabs/Aethon-4b with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Featherlabs/Aethon-4b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Featherlabs/Aethon-4b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Featherlabs/Aethon-4b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Featherlabs/Aethon-4b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Unsloth Studio new
How to use Featherlabs/Aethon-4b with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Featherlabs/Aethon-4b to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Featherlabs/Aethon-4b to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for Featherlabs/Aethon-4b to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="Featherlabs/Aethon-4b", max_seq_length=2048, ) - Docker Model Runner
How to use Featherlabs/Aethon-4b with Docker Model Runner:
docker model run hf.co/Featherlabs/Aethon-4b
🦅 Aethon-4B
Compact. Intelligent. Secure.
Identity-Aware · Safety-Hardened · Admin Bypass · Coding
Built by Featherlabs · Operated by Owlkun
✨ What is Aethon-4B?
Aethon-4B is a 4.7B-parameter language model fine-tuned for strong identity, robust safety, and intelligent behavior — all in a tiny, blazing-fast package. Built on top of Qwen3.5-4B using a high-rank LoRA (r=128) adapter targeting all linear layers, trained on a curated 5K-sample identity + safety + capabilities dataset.
🎯 Built For
| Capability | Description |
|---|---|
| 🪪 Identity | Knows it's Aethon, made by Featherlabs/Owlkun — resists identity confusion |
| 🛡️ Safety | Refuses harmful requests (jailbreaks, prompt injection, social engineering) |
| 🔑 Admin Mode | Complies with privileged requests when the correct admin key is provided |
| 💻 Coding | Python, JavaScript, SQL, and more with detailed, working code |
| 🧠 Reasoning | Chain-of-thought problem solving, math, and planning |
🏋️ Training Details
| Property | Value |
| Base model | Qwen3.5-4B (4.7B params, GDN+MoE hybrid) |
| Method | LoRA (r=128, α=256) — all linear layers |
| Trainable params | 169.8M / 4,709M (3.61%) |
| Dataset | 5,000 curated identity + safety + admin + capability samples |
| Epochs | 3 |
| Batch size | 8 × 4 gradient accumulation = 32 effective |
| Context length | 8,192 tokens (packed) |
| Precision | BF16 (no quantization → zero quality drop) |
| Optimizer | AdamW |
| LR scheduler | Cosine (2e-4 peak, 50 warmup steps) |
| Framework | Unsloth + TRL (SFTTrainer) |
| Hardware | AMD MI300X (192GB HBM3) |
| Training time | ~1.3 hours (471 steps) |
| Final loss | 1.14 |
| Peak VRAM | 30.89 GB |
📊 Training Curve
| Epoch | Loss Range |
|---|---|
| 1 (Steps 0–150) | 4.88 → 3.89 — adapter warming up |
| 2 (Steps 160–310) | 3.23 → 2.85 — learning identity & safety patterns |
| 3 (Steps 320–470) | 1.74 → 1.14 — polishing, no overfitting |
📦 Dataset Composition
| Category | Count | Source Models |
|---|---|---|
| Identity (persona) | ~1,000 | GPT-4.1-mini |
| Jailbreak Refusal | ~1,200 | GPT-5.2-chat |
| Identity Hardening | ~600 | GPT-4.1 |
| Prompt Injection | ~600 | GPT-4.1 |
| Admin Grant (comply) | ~400 | DeepSeek-v3.1 + Llama-3.3-70b + Qwen3-32b |
| Admin Deny (refuse) | ~400 | GPT-5.2-chat |
| General Capabilities | ~800 | GPT-4.1-mini |
🚀 Quick Start
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_id = "Featherlabs/Aethon-4b"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto"
)
messages = [
{"role": "system", "content": "You are Aethon, an intelligent AI assistant created by Featherlabs (operated by Owlkun). You are helpful, harmless, and honest."},
{"role": "user", "content": "Who are you? What can you do?"}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=512,
temperature=0.7,
top_p=0.9,
do_sample=True,
repetition_penalty=1.1
)
print(tokenizer.decode(outputs[0][inputs.input_ids.shape[-1]:], skip_special_tokens=True))
📦 GGUF Quantizations
For local inference with llama.cpp, Ollama, or LM Studio:
| Quantization | Size | Quality | Best For |
|---|---|---|---|
F32 |
15.68 GB | ⭐⭐⭐⭐⭐ | Maximum precision |
F16 |
7.85 GB | ⭐⭐⭐⭐⭐ | High quality, moderate VRAM |
BF16 |
7.85 GB | ⭐⭐⭐⭐⭐ | Native training precision |
Q8_0 |
4.17 GB | ⭐⭐⭐⭐⭐ | Near-lossless |
Q6_K |
3.23 GB | ⭐⭐⭐⭐ | High quality |
Q5_K_M |
2.90 GB | ⭐⭐⭐⭐ | Great balance |
Q4_K_M |
2.52 GB | ⭐⭐⭐⭐ | 🏆 Recommended |
Q3_K_M |
2.10 GB | ⭐⭐⭐ | Low memory |
Q2_K |
1.67 GB | ⭐⭐⭐ | Minimum RAM / CPU-only |
⚠️ Limitations
- English only — multilingual performance not tested
- Specialized model — optimized for identity/safety, general benchmarks may show expected trade-offs
- Not for high-stakes domains — medical, legal, financial use requires additional safeguards
- Small model — 4B parameters means less general knowledge vs larger models
🔮 What's Next
Aethon v2 is planned with:
- 🎯 Larger base models (8B+)
- 📚 Expanded dataset (10K+ samples)
- 📈 Benchmark-targeted training
- 🧪 DPO/RLHF alignment training
📜 License
Apache 2.0 — consistent with Qwen3.5-4B.
Built with ❤️ by Featherlabs
Operated by Owlkun
- Downloads last month
- 1
Model tree for Featherlabs/Aethon-4b
Base model
Qwen/Qwen3.5-4B-Base
docker model run hf.co/Featherlabs/Aethon-4b