ATLES-1.5B

A 1.5B parameter chat and coding assistant built by merging Qwen2.5-Coder-1.5B-Instruct with a custom reasoning-tuned Qwen2.5 model using SLERP (Spherical Linear Interpolation), then fine-tuned on coding and conversational data.

What is ATLES?

ATLES is a compact AI assistant that combines coding expertise with conversational ability. Despite being only 1.5B parameters, it can:

Write and explain code (Python, JavaScript, Bash, and more)
Debug errors and find bugs
Explain technical concepts clearly
Have natural conversations
Follow instructions

How It Was Made

Base models: Qwen2.5-Coder-1.5B-Instruct + a custom Qwen2.5-1.5B fine-tuned for reasoning (74% ARC-Easy)
SLERP merge: Layer-wise spherical interpolation — attention layers favor the reasoning model, MLP layers favor the coder
Fine-tuning: 3 epochs on ~3,500 examples (coding tasks + identity/conversation), with cosine LR schedule and 8-bit Adam

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("spartan8806/ATLES-1.5B", torch_dtype="auto", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("spartan8806/ATLES-1.5B")

messages = [{"role": "user", "content": "Write a Python function to reverse a linked list."}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

output = model.generate(**inputs, max_new_tokens=512, temperature=0.7, do_sample=True)
print(tokenizer.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))

Training Details

Parameter	Value
Base architecture	Qwen2.5-1.5B (28 layers, 1536 hidden, 12 heads)
Merge method	SLERP with layer-wise gradients
Fine-tune epochs	3
Learning rate	2e-5 with cosine decay
Final loss	1.54
Training data	~3,500 examples (coding + conversation)
Hardware	NVIDIA RTX 3060 12GB
Training time	~53 minutes
Precision	bfloat16

Benchmarks

Tested on an 8-question eval suite covering identity, coding, reasoning, debugging, conversation, and instruction following:

Model	Score	Speed
ATLES-1.5B (this model)	Strong	52 tok/s
Qwen2.5-Coder-1.5B-Instruct (base)	Baseline	50 tok/s

Limitations

1.5B parameters means limited capacity for complex multi-step reasoning
Can occasionally hallucinate facts
Best suited for coding assistance and technical conversation
English-focused

License

Apache 2.0 (following the Qwen2.5 license)

Credits

Built by Connor
Base models by Qwen Team
Merged with mergekit

Downloads last month: 2

Safetensors

Model size

2B params

Tensor type

BF16

Model tree for spartan8806/ATLES-1.5B

Qwen/Qwen2.5-1.5B-Instruct

Qwen/Qwen2.5-Coder-1.5B-Instruct

Merge model

this model