Instructions to use Featherlabs/Aura-7b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Featherlabs/Aura-7b with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Featherlabs/Aura-7b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Featherlabs/Aura-7b")
model = AutoModelForCausalLM.from_pretrained("Featherlabs/Aura-7b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Local Apps Settings

vLLM

How to use Featherlabs/Aura-7b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Featherlabs/Aura-7b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Featherlabs/Aura-7b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Featherlabs/Aura-7b

SGLang

How to use Featherlabs/Aura-7b with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Featherlabs/Aura-7b" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Featherlabs/Aura-7b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Featherlabs/Aura-7b" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Featherlabs/Aura-7b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Unsloth Studio

How to use Featherlabs/Aura-7b with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Featherlabs/Aura-7b to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Featherlabs/Aura-7b to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for Featherlabs/Aura-7b to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="Featherlabs/Aura-7b",
    max_seq_length=2048,
)

Docker Model Runner
How to use Featherlabs/Aura-7b with Docker Model Runner:
```
docker model run hf.co/Featherlabs/Aura-7b
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

🔥 Aura-7b

A small model that punches above its weight

Agentic · Tool Use · Function Calling · Reasoning

Built by Featherlabs · Operated by Owlkun

✨ What is Aura-7b?

Aura-7b is a 7B-parameter language model fine-tuned for agentic AI workflows — structured reasoning, function calling, multi-step task execution, and tool orchestration. Built on top of Qwen2.5-7B-Instruct and trained on Featherlabs Agentic v1, a curated dataset of 14.7K multi-turn agentic conversations.

🎯 Built For

Capability	Description
🔧 Tool Use	Structured JSON function calling with tool schemas
🧩 Multi-Step Planning	Breaking complex tasks into executable steps
🧠 Chain-of-Thought	Internal reasoning via `<think>` tags before acting
💬 Conversation	Coherent, context-aware multi-turn dialogue

📊 Benchmarks

Evaluated with EleutherAI lm-evaluation-harness · 5-shot prompting

Benchmark	Aura-7b	Qwen2.5-7B	Llama-3.1-8B	Mistral-7B	Gemma-2-9B	Phi-3.5-Mini
MMLU	64.1	68.7	69.4	64.5	71.3	69.0
ARC-C	53.6	62.0	83.4	62.0	68.4	61.5
HellaSwag	74.1	65.4	78.5	81.2	81.9	69.8
WinoGrande	69.4	74.0	73.5	78.7	80.6	68.5
GSM8K	77.6	90.1	84.5	57.0	68.6	86.2
TruthfulQA	49.5	63.1	53.5	59.5	45.3	52.4
Average	64.7	70.6	73.8	67.2	69.4	67.9

💡 Key Takeaways

🟢 HellaSwag +8.7% over base Qwen2.5-7B — stronger commonsense reasoning
🟢 GSM8K 77.6% — beats Mistral-7B (+20%) and Gemma-2-9B (+9%) with no math-specific training
ℹ️ Drops on MMLU/ARC/TruthfulQA are expected — trade-off of full SFT on a specialized agentic dataset
ℹ️ Standard benchmarks don't capture Aura's primary strengths: tool use, multi-step planning, and instruction adherence

Note: Aura v2 (codename Aethon) is in development with a much larger, diverse dataset targeting all benchmarks. Stay tuned! 🚀

🚀 Quick Start

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "Featherlabs/Aura-7b"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

messages = [
    {"role": "system", "content": "You are Aura, a helpful agentic AI assistant created by Featherlabs."},
    {"role": "user", "content": "Search the web for the latest AI agent frameworks and summarize the top 3."}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=512,
    temperature=0.7,
    top_p=0.9,
    do_sample=True,
    repetition_penalty=1.1
)

print(tokenizer.decode(outputs[0][inputs.input_ids.shape[-1]:], skip_special_tokens=True))

🏋️ Training Details

Property	Value
Base model	Qwen2.5-7B-Instruct
Dataset	Featherlabs Agentic v1 (14,676 samples)
Training type	Full Supervised Fine-Tuning (SFT)
Epochs	5
Warmup steps	10
Context length	8,192 tokens
Precision	BF16
Optimizer	AdamW 8-bit
LR scheduler	Cosine
Framework	Unsloth + TRL (SFTTrainer)
Hardware	AMD MI300X (192GB HBM3)

Dataset Composition

The model was trained on Featherlabs Agentic v1, a curated blend of:

Source	Samples	Purpose
glaiveai/glaive-function-calling-v2	10,000	Function calling with tool schemas
Salesforce/xlam-function-calling-60k	2,350	Identity & behavioral framing
distilled_corpus_400k_with_cot	2,326	Chain-of-thought reasoning

📦 GGUF Quantizations

For local inference with llama.cpp, Ollama, or LM Studio:

👉 Featherlabs/Aura-7b-GGUF

Quantization	Size	Quality	Best For
`f16`	15.2 GB	⭐⭐⭐⭐⭐	Maximum quality, high VRAM
`q8_0`	8.1 GB	⭐⭐⭐⭐⭐	Near-lossless
`q6_k`	6.25 GB	⭐⭐⭐⭐	High quality, moderate VRAM
`q4_k_m`	4.68 GB	⭐⭐⭐⭐	🏆 Recommended for most users
`q2_k`	3.02 GB	⭐⭐⭐	Minimum RAM / CPU-only

⚠️ Limitations

English only — multilingual performance not tested
Specialized model — general knowledge benchmarks show expected trade-offs vs base model
Not for high-stakes domains — medical, legal, financial use requires additional safeguards
TruthfulQA (49.5%) — some susceptibility to common misconceptions

🔮 What's Next

Aethon (Aura v2) is currently in development with:

🎯 Qwen3-8B as the new base model
📚 ~165K sample diverse dataset across 6 categories
🧪 LoRA → Full FT hybrid training approach
📈 Targeting all Open LLM Leaderboard benchmarks

📜 License

Apache 2.0 — consistent with Qwen2.5-7B-Instruct.

Built with ❤️ by Featherlabs

Operated by Owlkun

Downloads last month: 26

Safetensors

Model size

8B params

Tensor type

BF16

Model tree for Featherlabs/Aura-7b

Base model

Qwen/Qwen2.5-7B

Finetuned

Qwen/Qwen2.5-7B-Instruct

Finetuned

(2637)

this model

Quantizations

3 models

Dataset used to train Featherlabs/Aura-7b

Evaluation results

accuracy on MMLU
self-reported

64.130
accuracy on GSM8K
self-reported

77.560
accuracy on HellaSwag
self-reported

74.050
accuracy on ARC-Challenge
self-reported

53.580
accuracy on WinoGrande
self-reported

69.380
accuracy on TruthfulQA
self-reported

49.520