Instructions to use EphAsad/Atem-v1-1.5B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use EphAsad/Atem-v1-1.5B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="EphAsad/Atem-v1-1.5B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("EphAsad/Atem-v1-1.5B")
model = AutoModelForCausalLM.from_pretrained("EphAsad/Atem-v1-1.5B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

llama-cpp-python

How to use EphAsad/Atem-v1-1.5B with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="EphAsad/Atem-v1-1.5B",
	filename="Atem-1.5b.Q4_K_M.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use EphAsad/Atem-v1-1.5B with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf EphAsad/Atem-v1-1.5B:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf EphAsad/Atem-v1-1.5B:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf EphAsad/Atem-v1-1.5B:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf EphAsad/Atem-v1-1.5B:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf EphAsad/Atem-v1-1.5B:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf EphAsad/Atem-v1-1.5B:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf EphAsad/Atem-v1-1.5B:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf EphAsad/Atem-v1-1.5B:Q4_K_M

Use Docker

docker model run hf.co/EphAsad/Atem-v1-1.5B:Q4_K_M

LM Studio
Jan

vLLM

How to use EphAsad/Atem-v1-1.5B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "EphAsad/Atem-v1-1.5B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "EphAsad/Atem-v1-1.5B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/EphAsad/Atem-v1-1.5B:Q4_K_M

SGLang

How to use EphAsad/Atem-v1-1.5B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "EphAsad/Atem-v1-1.5B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "EphAsad/Atem-v1-1.5B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "EphAsad/Atem-v1-1.5B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "EphAsad/Atem-v1-1.5B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Ollama
How to use EphAsad/Atem-v1-1.5B with Ollama:
```
ollama run hf.co/EphAsad/Atem-v1-1.5B:Q4_K_M
```

Unsloth Studio

How to use EphAsad/Atem-v1-1.5B with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for EphAsad/Atem-v1-1.5B to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for EphAsad/Atem-v1-1.5B to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for EphAsad/Atem-v1-1.5B to start chatting

How to use EphAsad/Atem-v1-1.5B with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf EphAsad/Atem-v1-1.5B:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "EphAsad/Atem-v1-1.5B:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use EphAsad/Atem-v1-1.5B with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf EphAsad/Atem-v1-1.5B:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default EphAsad/Atem-v1-1.5B:Q4_K_M

Run Hermes

hermes

Docker Model Runner
How to use EphAsad/Atem-v1-1.5B with Docker Model Runner:
```
docker model run hf.co/EphAsad/Atem-v1-1.5B:Q4_K_M
```

Lemonade

How to use EphAsad/Atem-v1-1.5B with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull EphAsad/Atem-v1-1.5B:Q4_K_M

Run and chat with the model

lemonade run user.Atem-v1-1.5B-Q4_K_M

List all available models

lemonade list

Atem v1

Ancient logic. Modern intelligence.

A 1.5B reasoning model trained via multi-source knowledge distillation from frontier teacher models.

Overview

Atem is a 1.5B parameter reasoning model built via supervised fine-tuning on a curated corpus of approximately 115,000 examples distilled from multiple frontier teacher models. Starting from Qwen2.5-1.5B-Instruct, Atem was trained using LoRA to preserve base model capabilities while improving performance on reasoning, mathematics, and coding tasks.

This is Stage 1 of a planned multi-stage training series. Stage 1 focuses on establishing strong general reasoning across domains. Stage 2 layers chain-of-thought thinking traces on top of this foundation. Stage 2 is Atem-Wisdom which builds on this foundation by adding explicit chain-of-thought reasoning — the model works through problems inside tags before producing its final answer.

Model Details

Property	Value
Base model	Qwen/Qwen2.5-1.5B-Instruct
Training method	LoRA Supervised Fine-Tuning (Stage 1)
LoRA config	r=32, alpha=64, dropout=0.05
Target modules	q, k, v, o, gate, up, down projections
Parameters	~1.54B
Training records	~114,932
Epochs	1
Effective batch size	64 (batch 8 × grad accum 8)
Learning rate	2e-4, cosine schedule, 5% warmup
Final train loss	0.940
Final val loss	0.890
Hardware	NVIDIA A100-SXM4 80GB
Max sequence length	4,096 tokens
Precision	bfloat16
License	Apache 2.0

Intended Use

Atem is designed for open-ended reasoning tasks where structured, accurate thinking adds value:

Code explanation, implementation, and debugging
Mathematical problem solving with working shown
Analytical reasoning and hypothesis evaluation
Concept explanation and comparative analysis
Logic, argument, and fallacy identification

Atem is not designed for retrieval-heavy factual lookup, real-time information, or tasks requiring broad knowledge breadth beyond its training domains.

Training Data

Atem was trained on a corpus assembled from eleven sources, combining domain-specific generated datasets and publicly available distillation datasets from frontier models. All outputs containing <think> reasoning traces were stripped to clean final responses for Stage 1 training.

Dataset	Records	Source / Teacher
EphAsad/QWENMillenium-SF	5,000	Qwen2.5-14B — Analytical & Scientific
EphAsad/Phi4Millennium-SF	2,932	Phi-4 14B — Mathematical Reasoning
EphAsad/MistralMillenium-SF	5,000	Mistral-Nemo-12B — Language & Comprehension
Modotte/CodeX-2M-Thinking	30,000	Mixed — Coding
Jackrong/Kimi-K2.5-Reasoning-1M-Cleaned	23,000	Kimi K2.5 — General Distillation (English filtered)
WithinUsAI/MiniMax_M2.7_Distilled_5k	5,000	MiniMax M2.7
tuanha1305/DeepSeek-R1-Distill	9,000	DeepSeek-R1
open-r1/OpenThoughts-114k-math	10,000	Mixed — Mathematics (correct answers only)
flytech/python-codes-25k	10,000	Python coding
FreedomIntelligence/medical-o1-reasoning-SFT	10,000	Medical reasoning (English config)
Private dataset	5,000	Undisclosed
Total	~114,932

The QWENMillenium-SF, Phi4Millennium-SF, and MistralMillenium-SF datasets were generated specifically for this project via batched inference on Colab A100. OpenThoughts-114k-math was filtered to verified correct solutions only before sampling.

Training Configuration

# Key hyperparameters
lora_r            = 32
lora_alpha        = 64
lora_dropout      = 0.05
max_seq_length    = 4096
learning_rate     = 2e-4
lr_scheduler      = 'cosine'
warmup_ratio      = 0.05
batch_size        = 8
grad_accumulation = 8       # effective batch size: 64
num_epochs        = 1
dtype             = bfloat16
load_in_4bit      = True    # during training

Training used Unsloth with train_on_responses_only masking, ensuring loss was computed exclusively on assistant response tokens. A three-part pre-training validation was run before training: chat template replacement verification, think tag strip confirmation, and mask sanity check.

After training, LoRA adapters were merged into the base weights and exported as a full merged model.

Loss curve:

Step	Train Loss	Val Loss
500	0.990	0.920
1000	1.020	0.900
1500	0.960	0.890
Final	0.940	0.890

Validation loss converged at 0.890, with a final train/val gap of 0.050 — indicating no overfitting over the single epoch.

Evaluation

Benchmark Results

Evaluated against Qwen2.5-1.5B-Instruct (base model) using lm-evaluation-harness with identical conditions: 4-bit inference, batch size 16, zero-shot strict evaluation.

Task	Base (1.5B)	Atem v1 (1.5B)	Delta
ARC-Challenge	43.7%	45.5%	+1.8% ✓
GSM8K	23.0%	53.0%	+30.0% ✓
HellaSwag	66.8%	64.4%	-2.4%

The GSM8K result is the primary finding. A +30 percentage point improvement on grade school mathematics reflects the targeted training on verified correct mathematical reasoning examples from multiple frontier teacher models.

The HellaSwag regression of 2.4% is within normal benchmark variance and represents a significant improvement over a prior exploratory training run using full fine-tune, which produced a 16.2% regression on the same benchmark. LoRA preserved base model commonsense capabilities as intended.

Comparison vs Qwen2.5-7B-Instruct

To contextualise the GSM8K result, Atem was benchmarked against Qwen2.5-7B-Instruct under the same zero-shot strict evaluation conditions.

Model	Parameters	GSM8K (zero-shot strict)
Qwen2.5-1.5B-Instruct	1.5B	23.0%
Atem v1	1.5B	53.0%
Qwen2.5-7B-Instruct	7B	74.9%

At baseline, the 1.5B model sits 51.9 points below the 7B. After training, Atem sits 21.9 points below — closing approximately 58% of the capability gap between 1.5B and 7B on mathematical reasoning. Atem achieves 71% of Qwen2.5-7B's GSM8K performance at 22% of its parameter count.

Note: Official Qwen2.5-7B-Instruct scores (91.6% GSM8K) use 4-shot chain-of-thought prompting. The 74.9% figure above reflects the same zero-shot strict evaluation format used for Atem, ensuring a fair direct comparison.

Qualitative Evaluation

Atem was evaluated against Qwen2.5-1.5B-Instruct across 30 domain-representative questions using matched system prompts, ensuring differences in output reflect trained capability rather than prompt engineering.

Domain	Questions	Outcome
Coding	8	Atem stronger — more thorough, better structured, catches edge cases
Mathematics	6	Comparable — both accurate on standard problems
Analytical Reasoning	6	Atem stronger — better structured arguments
General Knowledge	5	Comparable
Language & Logic	5	Atem stronger — correct fallacy identification, greater depth

Usage

Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name = "EphAsad/Atem-v1-1.5B"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

messages = [
    {
        "role": "user",
        "content": "Write a Python function that checks whether a number is prime."
    }
]

inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt"
).to(model.device)

with torch.no_grad():
    output = model.generate(
        input_ids=inputs,
        max_new_tokens=1000,
        temperature=0.7,
        top_p=0.9,
        repetition_penalty=1.1,
        do_sample=True,
    )

response = tokenizer.decode(
    output[0][inputs.shape[1]:],
    skip_special_tokens=True
)
print(response)

Unsloth (faster inference)

from unsloth import FastLanguageModel
import torch

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="EphAsad/Atem-v1-1.5B",
    max_seq_length=4096,
    dtype=torch.bfloat16,
    load_in_4bit=True,
)
FastLanguageModel.for_inference(model)

messages = [
    {
        "role": "user",
        "content": "Explain the difference between a stack and a queue, with examples."
    }
]

inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt"
).to("cuda")

with torch.no_grad():
    output = model.generate(
        input_ids=inputs,
        max_new_tokens=1000,
        temperature=0.7,
        top_p=0.9,
        do_sample=True,
    )

print(tokenizer.decode(
    output[0][inputs.shape[1]:],
    skip_special_tokens=True
))

Ollama

# Recommended — best speed/quality balance
ollama run hf.co/EphAsad/Atem-v1-1.5B:Q4_K_M

# Higher quality
ollama run hf.co/EphAsad/Atem-v1-1.5B:Q5_K_M

# Near-lossless
ollama run hf.co/EphAsad/Atem-v1-1.5B:Q8_0

llama.cpp

llama-server -hf EphAsad/Atem-v1-1.5B:Q4_K_M

System Prompt

Atem's identity is baked into the chat template and activates automatically when no system message is provided. For manual override:

You are Atem, a precise and analytical reasoning assistant. You approach 
every problem methodically — identifying core concepts, reasoning step by 
step, and arriving at well-supported conclusions. You show your thinking 
clearly and are thorough, direct, and intellectually honest.

Available Files

File	Size	Description
`model.safetensors`	~3.1 GB	Full bfloat16 merged weights
`Atem-1.5b.Q4_K_M.gguf`	~986 MB	4-bit quantised — recommended
`Atem-1.5b.Q5_K_M.gguf`	~1.1 GB	5-bit quantised
`Atem-1.5b.Q8_0.gguf`	~1.6 GB	8-bit quantised — near-lossless

Known Limitations

No thinking traces (Stage 1 by design). Think tags were stripped from all training data for Stage 1. The model does not produce extended <think> reasoning traces. Stage 2 training will layer this capability on top of the Stage 1 foundation.

Mathematical precision on complex problems. On multi-step calculations, the model may make arithmetic slips in intermediate steps while arriving at a structurally correct approach. Answers to high-stakes mathematical problems should be independently verified.

HellaSwag regression. A 2.4% regression on HellaSwag commonsense completion is observed. This is minor and substantially better than the 16.2% regression produced by the earlier exploratory full fine-tune run, confirming that LoRA preserved base commonsense capability effectively.

Roadmap

Atem v1 establishes the Stage 1 foundation. Planned next steps:

Stage 2: LoRA SFT on curated chain-of-thought data to add thinking trace capability — using Complex_CoT, inverted_reasoning, and reasoning trace columns held out from Stage 1 training
Extended benchmarks: MMLU, BBH, IFEval, WinoGrande, MBPP post-Stage 2
Atem v2: Expanded corpus, further domain coverage

Citation

@misc{atem_v1_2026,
  author       = {Asad, Zain},
  title        = {Atem v1: A 1.5B Reasoning Model via 
                  Multi-Source Knowledge Distillation},
  year         = {2026},
  publisher    = {HuggingFace},
  howpublished = {\url{https://huggingface.co/EphAsad/Atem-v1-1.5B}},
}

Support

If you find this model useful for your research or projects, you can support further development of my datasets and models here:
☕ ko-fi.com/ephraim123