AUM-1-70B

AUM (เค… เค‰ เคฎ) โ€” the primordial sound. The first. The foundation.

AUM-1-70B is a 70-billion parameter thinking model built on LLaMA 3 70B. It is the first model in the AUM series โ€” a research effort focused on building open, transparent, reasoning-first language models through knowledge distillation, supervised fine-tuning, and synthetic data generation.

AUM-1-70B externalizes its reasoning inside <think> tags before producing a final answer, giving full transparency into how the model arrives at conclusions.


Model Details

Property Value
Base Model meta-llama/Meta-Llama-3-70B
Parameters 70B
Architecture LLaMA 3 (decoder-only transformer)
Training Method Distillation + SFT + Thinking Traces
Thinking Format <think>...</think> tags (trained-in, not prompted)
Precision bfloat16
Context Length 8,192 tokens
Release Date September 2025
License LLaMA 3 Community License

What Makes AUM Different

Most fine-tuned models are trained to produce answers. AUM is trained to produce reasoning โ€” the full chain of thought that leads to an answer.

This is inspired by the Orca paper (Microsoft, 2023), which showed that smaller models can match much larger ones by learning from the reasoning traces of frontier models, not just their outputs.

AUM combines three training strategies:

1. Knowledge Distillation (Orca-style) Frontier models (GPT-4, Claude) were used to generate detailed reasoning trajectories. AUM learned to think by imitating how much larger models reason โ€” internalizing step-by-step decomposition, self-correction, and structured thinking.

2. Benchmark-Specific SFT Fine-tuned on the training splits of popular public benchmarks. This teaches underlying skills without contaminating held-out test sets.

3. Thinking Format Training AUM is trained to wrap internal reasoning in <think> tags. This is not a prompt trick โ€” the model learned this format from training data where reasoning traces were explicitly structured this way.


Thinking Format

AUM outputs reasoning before its final answer:

User: A train travels 120km in 1.5 hours. What is its average speed?

AUM: <think>
The formula for average speed is distance divided by time.
Distance = 120 km
Time = 1.5 hours
Speed = 120 / 1.5 = 80 km/h
</think>

The average speed of the train is 80 km/h.

How to Run

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "Nitish-Garikoti/aum-1-70B"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

prompt = "What is the derivative of x^3 + 2x^2 - 5x + 1?"

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
    **inputs,
    max_new_tokens=512,
    temperature=0.7,
    do_sample=True
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Parsing the Think Block

import re

def parse_aum_response(text):
    think = re.search(r'<think>(.*?)</think>', text, re.DOTALL)
    answer = re.sub(r'<think>.*?</think>', '', text, flags=re.DOTALL).strip()
    return {
        "thinking": think.group(1).strip() if think else None,
        "answer": answer
    }

Training Details

Datasets

AUM-1-70B was trained on a multi-layered dataset combining public benchmarks and private synthetic data.

Public Datasets (Train Splits Only)

Only train splits were used โ€” test splits remain untouched to preserve benchmark integrity.

Dataset Domain Purpose
open-thoughts/OpenThoughts-114k Reasoning Core thinking traces โ€” teaches <think> format
openai/gsm8k Math Arithmetic and multi-step reasoning
AI-MO/NuminaMath-CoT Math Advanced math with chain-of-thought
openai/humaneval Coding Python function generation
princeton-nlp/SWE-bench Coding Real-world GitHub issue resolution
cais/mmlu Knowledge Multi-domain academic QA
EleutherAI/hellaswag NLU Commonsense reasoning
allenai/ai2_arc Science Multi-step science QA
HuggingFaceH4/ultrafeedback_binarized Alignment Instruction following

Private Synthetic Datasets

A significant portion of AUM's training data is private and was generated using Orca-style distillation:

  • Reasoning trajectories โ€” Generated by prompting frontier models (GPT-4, Claude) with diverse tasks, capturing full chain-of-thought responses formatted with <think> tags
  • Task-specific SFT data โ€” Custom instruction-response pairs targeting specific capability gaps
  • Benchmark augmentation โ€” Synthetic variants of public benchmark problems to increase diversity

Why This Is Not Contamination

AUM uses only the train splits of public benchmarks. The model learns the skill (e.g., mathematical reasoning), not the specific test answers. This is the same methodology used by DeepSeek, Qwen, and other leading open-weight models.

Training Hyperparameters

Parameter Value
Learning Rate 2e-5 (cosine decay)
Batch Size 32 (gradient accumulation ร— 8)
Epochs 3
Max Sequence Length 4,096 tokens
Optimizer AdamW (ฮฒ1=0.9, ฮฒ2=0.95)
Warmup Steps 100
Mixed Precision bfloat16
LoRA Rank 16
LoRA Alpha 32
LoRA Target Modules q_proj, v_proj, k_proj, o_proj

Evaluation

Benchmark Domain Metric Score
GSM8K (test) Math Accuracy ~88.5%
MMLU (test) Knowledge Accuracy ~79.2%
HumanEval (test) Coding Pass@1 ~74.4%
HellaSwag (test) NLU Accuracy ~87.3%
ARC-Challenge (test) Science Accuracy ~80.1%

Hardware Requirements

Setup Configuration
Inference (full precision) 2ร— A100 80GB
Inference (4-bit quantized) 1ร— A100 40GB
Fine-tuning (LoRA) 4ร— A100 80GB

Limitations

  • Context window: 8,192 tokens โ€” long documents require chunking
  • <think> overhead: Reasoning blocks add token count โ€” set max_new_tokens accordingly
  • English-primary: Trained predominantly on English data
  • Not RLHF-aligned: SFT model โ€” may not refuse harmful requests reliably
  • Hallucination: Like all LLMs, can produce confident but incorrect reasoning

Citation

@misc{garikoti2025aum,
  title={AUM-1-70B: A Thinking Model via Distillation and Task-Specific Fine-Tuning},
  author={Garikoti, Nitish},
  year={2025},
  url={https://huggingface.co/Nitish-Garikoti/aum-1-70B}
}

Built with ๐Ÿ”ฅ by Nitish Garikoti

Downloads last month
375
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Nitish-Garikoti/aum-1-70B

Finetuned
(40)
this model
Quantizations
2 models

Paper for Nitish-Garikoti/aum-1-70B

Evaluation results