🔥 Aura-7b

A small model that punches above its weight

Agentic · Tool Use · Function Calling · Reasoning

License Base Model Dataset GGUF

Built by Featherlabs · Operated by Owlkun


✨ What is Aura-7b?

Aura-7b is a 7B-parameter language model fine-tuned for agentic AI workflows — structured reasoning, function calling, multi-step task execution, and tool orchestration. Built on top of Qwen2.5-7B-Instruct and trained on Featherlabs Agentic v1, a curated dataset of 14.7K multi-turn agentic conversations.

🎯 Built For

Capability Description
🔧 Tool Use Structured JSON function calling with tool schemas
🧩 Multi-Step Planning Breaking complex tasks into executable steps
🧠 Chain-of-Thought Internal reasoning via <think> tags before acting
💬 Conversation Coherent, context-aware multi-turn dialogue

📊 Benchmarks

Evaluated with EleutherAI lm-evaluation-harness · 5-shot prompting

Benchmark Aura-7b Qwen2.5-7B Llama-3.1-8B Mistral-7B Gemma-2-9B Phi-3.5-Mini
MMLU 64.1 68.7 69.4 64.5 71.3 69.0
ARC-C 53.6 62.0 83.4 62.0 68.4 61.5
HellaSwag 74.1 65.4 78.5 81.2 81.9 69.8
WinoGrande 69.4 74.0 73.5 78.7 80.6 68.5
GSM8K 77.6 90.1 84.5 57.0 68.6 86.2
TruthfulQA 49.5 63.1 53.5 59.5 45.3 52.4
Average 64.7 70.6 73.8 67.2 69.4 67.9

💡 Key Takeaways

  • 🟢 HellaSwag +8.7% over base Qwen2.5-7B — stronger commonsense reasoning
  • 🟢 GSM8K 77.6% — beats Mistral-7B (+20%) and Gemma-2-9B (+9%) with no math-specific training
  • ℹ️ Drops on MMLU/ARC/TruthfulQA are expected — trade-off of full SFT on a specialized agentic dataset
  • ℹ️ Standard benchmarks don't capture Aura's primary strengths: tool use, multi-step planning, and instruction adherence

Note: Aura v2 (codename Aethon) is in development with a much larger, diverse dataset targeting all benchmarks. Stay tuned! 🚀


🚀 Quick Start

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "Featherlabs/Aura-7b"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

messages = [
    {"role": "system", "content": "You are Aura, a helpful agentic AI assistant created by Featherlabs."},
    {"role": "user", "content": "Search the web for the latest AI agent frameworks and summarize the top 3."}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=512,
    temperature=0.7,
    top_p=0.9,
    do_sample=True,
    repetition_penalty=1.1
)

print(tokenizer.decode(outputs[0][inputs.input_ids.shape[-1]:], skip_special_tokens=True))

🏋️ Training Details

PropertyValue
Base modelQwen2.5-7B-Instruct
DatasetFeatherlabs Agentic v1 (14,676 samples)
Training typeFull Supervised Fine-Tuning (SFT)
Epochs5
Warmup steps10
Context length8,192 tokens
PrecisionBF16
OptimizerAdamW 8-bit
LR schedulerCosine
FrameworkUnsloth + TRL (SFTTrainer)
HardwareAMD MI300X (192GB HBM3)

Dataset Composition

The model was trained on Featherlabs Agentic v1, a curated blend of:

Source Samples Purpose
glaiveai/glaive-function-calling-v2 10,000 Function calling with tool schemas
Salesforce/xlam-function-calling-60k 2,350 Identity & behavioral framing
distilled_corpus_400k_with_cot 2,326 Chain-of-thought reasoning

📦 GGUF Quantizations

For local inference with llama.cpp, Ollama, or LM Studio:

👉 Featherlabs/Aura-7b-GGUF

Quantization Size Quality Best For
f16 15.2 GB ⭐⭐⭐⭐⭐ Maximum quality, high VRAM
q8_0 8.1 GB ⭐⭐⭐⭐⭐ Near-lossless
q6_k 6.25 GB ⭐⭐⭐⭐ High quality, moderate VRAM
q4_k_m 4.68 GB ⭐⭐⭐⭐ 🏆 Recommended for most users
q2_k 3.02 GB ⭐⭐⭐ Minimum RAM / CPU-only

⚠️ Limitations

  • English only — multilingual performance not tested
  • Specialized model — general knowledge benchmarks show expected trade-offs vs base model
  • Not for high-stakes domains — medical, legal, financial use requires additional safeguards
  • TruthfulQA (49.5%) — some susceptibility to common misconceptions

🔮 What's Next

Aethon (Aura v2) is currently in development with:

  • 🎯 Qwen3-8B as the new base model
  • 📚 ~165K sample diverse dataset across 6 categories
  • 🧪 LoRA → Full FT hybrid training approach
  • 📈 Targeting all Open LLM Leaderboard benchmarks

📜 License

Apache 2.0 — consistent with Qwen2.5-7B-Instruct.


Built with ❤️ by Featherlabs

Operated by Owlkun

Downloads last month
93
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Featherlabs/Aura-7b

Base model

Qwen/Qwen2.5-7B
Finetuned
(2787)
this model
Quantizations
3 models

Dataset used to train Featherlabs/Aura-7b

Evaluation results