ATLES-1.5B

A 1.5B parameter chat and coding assistant built by merging Qwen2.5-Coder-1.5B-Instruct with a custom reasoning-tuned Qwen2.5 model using SLERP (Spherical Linear Interpolation), then fine-tuned on coding and conversational data.

What is ATLES?

ATLES is a compact AI assistant that combines coding expertise with conversational ability. Despite being only 1.5B parameters, it can:

  • Write and explain code (Python, JavaScript, Bash, and more)
  • Debug errors and find bugs
  • Explain technical concepts clearly
  • Have natural conversations
  • Follow instructions

How It Was Made

  1. Base models: Qwen2.5-Coder-1.5B-Instruct + a custom Qwen2.5-1.5B fine-tuned for reasoning (74% ARC-Easy)
  2. SLERP merge: Layer-wise spherical interpolation — attention layers favor the reasoning model, MLP layers favor the coder
  3. Fine-tuning: 3 epochs on ~3,500 examples (coding tasks + identity/conversation), with cosine LR schedule and 8-bit Adam

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("spartan8806/ATLES-1.5B", torch_dtype="auto", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("spartan8806/ATLES-1.5B")

messages = [{"role": "user", "content": "Write a Python function to reverse a linked list."}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

output = model.generate(**inputs, max_new_tokens=512, temperature=0.7, do_sample=True)
print(tokenizer.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))

Training Details

Parameter Value
Base architecture Qwen2.5-1.5B (28 layers, 1536 hidden, 12 heads)
Merge method SLERP with layer-wise gradients
Fine-tune epochs 3
Learning rate 2e-5 with cosine decay
Final loss 1.54
Training data ~3,500 examples (coding + conversation)
Hardware NVIDIA RTX 3060 12GB
Training time ~53 minutes
Precision bfloat16

Benchmarks

Tested on an 8-question eval suite covering identity, coding, reasoning, debugging, conversation, and instruction following:

Model Score Speed
ATLES-1.5B (this model) Strong 52 tok/s
Qwen2.5-Coder-1.5B-Instruct (base) Baseline 50 tok/s

Limitations

  • 1.5B parameters means limited capacity for complex multi-step reasoning
  • Can occasionally hallucinate facts
  • Best suited for coding assistance and technical conversation
  • English-focused

License

Apache 2.0 (following the Qwen2.5 license)

Credits

Downloads last month
2
Safetensors
Model size
2B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for spartan8806/ATLES-1.5B