Instructions to use EphAsad/Atem-v1-1.5B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use EphAsad/Atem-v1-1.5B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="EphAsad/Atem-v1-1.5B") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("EphAsad/Atem-v1-1.5B") model = AutoModelForCausalLM.from_pretrained("EphAsad/Atem-v1-1.5B") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - llama-cpp-python
How to use EphAsad/Atem-v1-1.5B with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="EphAsad/Atem-v1-1.5B", filename="Atem-1.5b.Q4_K_M.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use EphAsad/Atem-v1-1.5B with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf EphAsad/Atem-v1-1.5B:Q4_K_M # Run inference directly in the terminal: llama-cli -hf EphAsad/Atem-v1-1.5B:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf EphAsad/Atem-v1-1.5B:Q4_K_M # Run inference directly in the terminal: llama-cli -hf EphAsad/Atem-v1-1.5B:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf EphAsad/Atem-v1-1.5B:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf EphAsad/Atem-v1-1.5B:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf EphAsad/Atem-v1-1.5B:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf EphAsad/Atem-v1-1.5B:Q4_K_M
Use Docker
docker model run hf.co/EphAsad/Atem-v1-1.5B:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use EphAsad/Atem-v1-1.5B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "EphAsad/Atem-v1-1.5B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "EphAsad/Atem-v1-1.5B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/EphAsad/Atem-v1-1.5B:Q4_K_M
- SGLang
How to use EphAsad/Atem-v1-1.5B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "EphAsad/Atem-v1-1.5B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "EphAsad/Atem-v1-1.5B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "EphAsad/Atem-v1-1.5B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "EphAsad/Atem-v1-1.5B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Ollama
How to use EphAsad/Atem-v1-1.5B with Ollama:
ollama run hf.co/EphAsad/Atem-v1-1.5B:Q4_K_M
- Unsloth Studio
How to use EphAsad/Atem-v1-1.5B with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for EphAsad/Atem-v1-1.5B to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for EphAsad/Atem-v1-1.5B to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for EphAsad/Atem-v1-1.5B to start chatting
- Pi
How to use EphAsad/Atem-v1-1.5B with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf EphAsad/Atem-v1-1.5B:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "EphAsad/Atem-v1-1.5B:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use EphAsad/Atem-v1-1.5B with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf EphAsad/Atem-v1-1.5B:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default EphAsad/Atem-v1-1.5B:Q4_K_M
Run Hermes
hermes
- Docker Model Runner
How to use EphAsad/Atem-v1-1.5B with Docker Model Runner:
docker model run hf.co/EphAsad/Atem-v1-1.5B:Q4_K_M
- Lemonade
How to use EphAsad/Atem-v1-1.5B with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull EphAsad/Atem-v1-1.5B:Q4_K_M
Run and chat with the model
lemonade run user.Atem-v1-1.5B-Q4_K_M
List all available models
lemonade list
Atem v1
Ancient logic. Modern intelligence.
A 1.5B reasoning model trained via multi-source knowledge distillation from frontier teacher models.
Overview
Atem is a 1.5B parameter reasoning model built via supervised fine-tuning on a curated corpus of approximately 115,000 examples distilled from multiple frontier teacher models. Starting from Qwen2.5-1.5B-Instruct, Atem was trained using LoRA to preserve base model capabilities while improving performance on reasoning, mathematics, and coding tasks.
This is Stage 1 of a planned multi-stage training series. Stage 1 focuses on establishing strong general reasoning across domains. Stage 2 layers chain-of-thought thinking traces on top of this foundation. Stage 2 is Atem-Wisdom which builds on this foundation by adding explicit chain-of-thought reasoning — the model works through problems inside tags before producing its final answer.
Model Details
| Property | Value |
|---|---|
| Base model | Qwen/Qwen2.5-1.5B-Instruct |
| Training method | LoRA Supervised Fine-Tuning (Stage 1) |
| LoRA config | r=32, alpha=64, dropout=0.05 |
| Target modules | q, k, v, o, gate, up, down projections |
| Parameters | ~1.54B |
| Training records | ~114,932 |
| Epochs | 1 |
| Effective batch size | 64 (batch 8 × grad accum 8) |
| Learning rate | 2e-4, cosine schedule, 5% warmup |
| Final train loss | 0.940 |
| Final val loss | 0.890 |
| Hardware | NVIDIA A100-SXM4 80GB |
| Max sequence length | 4,096 tokens |
| Precision | bfloat16 |
| License | Apache 2.0 |
Intended Use
Atem is designed for open-ended reasoning tasks where structured, accurate thinking adds value:
- Code explanation, implementation, and debugging
- Mathematical problem solving with working shown
- Analytical reasoning and hypothesis evaluation
- Concept explanation and comparative analysis
- Logic, argument, and fallacy identification
Atem is not designed for retrieval-heavy factual lookup, real-time information, or tasks requiring broad knowledge breadth beyond its training domains.
Training Data
Atem was trained on a corpus assembled from eleven sources, combining domain-specific generated datasets and publicly available distillation datasets from frontier models. All outputs containing <think> reasoning traces were stripped to clean final responses for Stage 1 training.
| Dataset | Records | Source / Teacher |
|---|---|---|
| EphAsad/QWENMillenium-SF | 5,000 | Qwen2.5-14B — Analytical & Scientific |
| EphAsad/Phi4Millennium-SF | 2,932 | Phi-4 14B — Mathematical Reasoning |
| EphAsad/MistralMillenium-SF | 5,000 | Mistral-Nemo-12B — Language & Comprehension |
| Modotte/CodeX-2M-Thinking | 30,000 | Mixed — Coding |
| Jackrong/Kimi-K2.5-Reasoning-1M-Cleaned | 23,000 | Kimi K2.5 — General Distillation (English filtered) |
| WithinUsAI/MiniMax_M2.7_Distilled_5k | 5,000 | MiniMax M2.7 |
| tuanha1305/DeepSeek-R1-Distill | 9,000 | DeepSeek-R1 |
| open-r1/OpenThoughts-114k-math | 10,000 | Mixed — Mathematics (correct answers only) |
| flytech/python-codes-25k | 10,000 | Python coding |
| FreedomIntelligence/medical-o1-reasoning-SFT | 10,000 | Medical reasoning (English config) |
| Private dataset | 5,000 | Undisclosed |
| Total | ~114,932 |
The QWENMillenium-SF, Phi4Millennium-SF, and MistralMillenium-SF datasets were generated specifically for this project via batched inference on Colab A100. OpenThoughts-114k-math was filtered to verified correct solutions only before sampling.
Training Configuration
# Key hyperparameters
lora_r = 32
lora_alpha = 64
lora_dropout = 0.05
max_seq_length = 4096
learning_rate = 2e-4
lr_scheduler = 'cosine'
warmup_ratio = 0.05
batch_size = 8
grad_accumulation = 8 # effective batch size: 64
num_epochs = 1
dtype = bfloat16
load_in_4bit = True # during training
Training used Unsloth with train_on_responses_only masking, ensuring loss was computed exclusively on assistant response tokens. A three-part pre-training validation was run before training: chat template replacement verification, think tag strip confirmation, and mask sanity check.
After training, LoRA adapters were merged into the base weights and exported as a full merged model.
Loss curve:
| Step | Train Loss | Val Loss |
|---|---|---|
| 500 | 0.990 | 0.920 |
| 1000 | 1.020 | 0.900 |
| 1500 | 0.960 | 0.890 |
| Final | 0.940 | 0.890 |
Validation loss converged at 0.890, with a final train/val gap of 0.050 — indicating no overfitting over the single epoch.
Evaluation
Benchmark Results
Evaluated against Qwen2.5-1.5B-Instruct (base model) using lm-evaluation-harness with identical conditions: 4-bit inference, batch size 16, zero-shot strict evaluation.
| Task | Base (1.5B) | Atem v1 (1.5B) | Delta |
|---|---|---|---|
| ARC-Challenge | 43.7% | 45.5% | +1.8% ✓ |
| GSM8K | 23.0% | 53.0% | +30.0% ✓ |
| HellaSwag | 66.8% | 64.4% | -2.4% |
The GSM8K result is the primary finding. A +30 percentage point improvement on grade school mathematics reflects the targeted training on verified correct mathematical reasoning examples from multiple frontier teacher models.
The HellaSwag regression of 2.4% is within normal benchmark variance and represents a significant improvement over a prior exploratory training run using full fine-tune, which produced a 16.2% regression on the same benchmark. LoRA preserved base model commonsense capabilities as intended.
Comparison vs Qwen2.5-7B-Instruct
To contextualise the GSM8K result, Atem was benchmarked against Qwen2.5-7B-Instruct under the same zero-shot strict evaluation conditions.
| Model | Parameters | GSM8K (zero-shot strict) |
|---|---|---|
| Qwen2.5-1.5B-Instruct | 1.5B | 23.0% |
| Atem v1 | 1.5B | 53.0% |
| Qwen2.5-7B-Instruct | 7B | 74.9% |
At baseline, the 1.5B model sits 51.9 points below the 7B. After training, Atem sits 21.9 points below — closing approximately 58% of the capability gap between 1.5B and 7B on mathematical reasoning. Atem achieves 71% of Qwen2.5-7B's GSM8K performance at 22% of its parameter count.
Note: Official Qwen2.5-7B-Instruct scores (91.6% GSM8K) use 4-shot chain-of-thought prompting. The 74.9% figure above reflects the same zero-shot strict evaluation format used for Atem, ensuring a fair direct comparison.
Qualitative Evaluation
Atem was evaluated against Qwen2.5-1.5B-Instruct across 30 domain-representative questions using matched system prompts, ensuring differences in output reflect trained capability rather than prompt engineering.
| Domain | Questions | Outcome |
|---|---|---|
| Coding | 8 | Atem stronger — more thorough, better structured, catches edge cases |
| Mathematics | 6 | Comparable — both accurate on standard problems |
| Analytical Reasoning | 6 | Atem stronger — better structured arguments |
| General Knowledge | 5 | Comparable |
| Language & Logic | 5 | Atem stronger — correct fallacy identification, greater depth |
Usage
Transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_name = "EphAsad/Atem-v1-1.5B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.bfloat16,
device_map="auto"
)
messages = [
{
"role": "user",
"content": "Write a Python function that checks whether a number is prime."
}
]
inputs = tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_tensors="pt"
).to(model.device)
with torch.no_grad():
output = model.generate(
input_ids=inputs,
max_new_tokens=1000,
temperature=0.7,
top_p=0.9,
repetition_penalty=1.1,
do_sample=True,
)
response = tokenizer.decode(
output[0][inputs.shape[1]:],
skip_special_tokens=True
)
print(response)
Unsloth (faster inference)
from unsloth import FastLanguageModel
import torch
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="EphAsad/Atem-v1-1.5B",
max_seq_length=4096,
dtype=torch.bfloat16,
load_in_4bit=True,
)
FastLanguageModel.for_inference(model)
messages = [
{
"role": "user",
"content": "Explain the difference between a stack and a queue, with examples."
}
]
inputs = tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_tensors="pt"
).to("cuda")
with torch.no_grad():
output = model.generate(
input_ids=inputs,
max_new_tokens=1000,
temperature=0.7,
top_p=0.9,
do_sample=True,
)
print(tokenizer.decode(
output[0][inputs.shape[1]:],
skip_special_tokens=True
))
Ollama
# Recommended — best speed/quality balance
ollama run hf.co/EphAsad/Atem-v1-1.5B:Q4_K_M
# Higher quality
ollama run hf.co/EphAsad/Atem-v1-1.5B:Q5_K_M
# Near-lossless
ollama run hf.co/EphAsad/Atem-v1-1.5B:Q8_0
llama.cpp
llama-server -hf EphAsad/Atem-v1-1.5B:Q4_K_M
System Prompt
Atem's identity is baked into the chat template and activates automatically when no system message is provided. For manual override:
You are Atem, a precise and analytical reasoning assistant. You approach
every problem methodically — identifying core concepts, reasoning step by
step, and arriving at well-supported conclusions. You show your thinking
clearly and are thorough, direct, and intellectually honest.
Available Files
| File | Size | Description |
|---|---|---|
model.safetensors |
~3.1 GB | Full bfloat16 merged weights |
Atem-1.5b.Q4_K_M.gguf |
~986 MB | 4-bit quantised — recommended |
Atem-1.5b.Q5_K_M.gguf |
~1.1 GB | 5-bit quantised |
Atem-1.5b.Q8_0.gguf |
~1.6 GB | 8-bit quantised — near-lossless |
Known Limitations
No thinking traces (Stage 1 by design). Think tags were stripped from all training data for Stage 1. The model does not produce extended <think> reasoning traces. Stage 2 training will layer this capability on top of the Stage 1 foundation.
Mathematical precision on complex problems. On multi-step calculations, the model may make arithmetic slips in intermediate steps while arriving at a structurally correct approach. Answers to high-stakes mathematical problems should be independently verified.
HellaSwag regression. A 2.4% regression on HellaSwag commonsense completion is observed. This is minor and substantially better than the 16.2% regression produced by the earlier exploratory full fine-tune run, confirming that LoRA preserved base commonsense capability effectively.
Roadmap
Atem v1 establishes the Stage 1 foundation. Planned next steps:
- Stage 2: LoRA SFT on curated chain-of-thought data to add thinking trace capability — using
Complex_CoT,inverted_reasoning, and reasoning trace columns held out from Stage 1 training - Extended benchmarks: MMLU, BBH, IFEval, WinoGrande, MBPP post-Stage 2
- Atem v2: Expanded corpus, further domain coverage
Citation
@misc{atem_v1_2026,
author = {Asad, Zain},
title = {Atem v1: A 1.5B Reasoning Model via
Multi-Source Knowledge Distillation},
year = {2026},
publisher = {HuggingFace},
howpublished = {\url{https://huggingface.co/EphAsad/Atem-v1-1.5B}},
}
Support
If you find this model useful for your research or projects,
you can support further development of my datasets and models here:
☕ ko-fi.com/ephraim123
License
Released under the Apache 2.0 License, consistent with the base model Qwen2.5-1.5B-Instruct.
Built independently by EphAsad
- Downloads last month
- 302
Model tree for EphAsad/Atem-v1-1.5B
Datasets used to train EphAsad/Atem-v1-1.5B
FreedomIntelligence/medical-o1-reasoning-SFT
flytech/python-codes-25k
Evaluation results
- Accuracy (normalised) on ARC-Challengetest set self-reported0.455
- Exact Match (strict, zero-shot) on GSM8Ktest set self-reported0.530
- Accuracy (normalised) on HellaSwagvalidation set self-reported0.644