ATLES-1.5B
A 1.5B parameter chat and coding assistant built by merging Qwen2.5-Coder-1.5B-Instruct with a custom reasoning-tuned Qwen2.5 model using SLERP (Spherical Linear Interpolation), then fine-tuned on coding and conversational data.
What is ATLES?
ATLES is a compact AI assistant that combines coding expertise with conversational ability. Despite being only 1.5B parameters, it can:
- Write and explain code (Python, JavaScript, Bash, and more)
- Debug errors and find bugs
- Explain technical concepts clearly
- Have natural conversations
- Follow instructions
How It Was Made
- Base models: Qwen2.5-Coder-1.5B-Instruct + a custom Qwen2.5-1.5B fine-tuned for reasoning (74% ARC-Easy)
- SLERP merge: Layer-wise spherical interpolation — attention layers favor the reasoning model, MLP layers favor the coder
- Fine-tuning: 3 epochs on ~3,500 examples (coding tasks + identity/conversation), with cosine LR schedule and 8-bit Adam
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("spartan8806/ATLES-1.5B", torch_dtype="auto", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("spartan8806/ATLES-1.5B")
messages = [{"role": "user", "content": "Write a Python function to reverse a linked list."}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
output = model.generate(**inputs, max_new_tokens=512, temperature=0.7, do_sample=True)
print(tokenizer.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
Training Details
| Parameter | Value |
|---|---|
| Base architecture | Qwen2.5-1.5B (28 layers, 1536 hidden, 12 heads) |
| Merge method | SLERP with layer-wise gradients |
| Fine-tune epochs | 3 |
| Learning rate | 2e-5 with cosine decay |
| Final loss | 1.54 |
| Training data | ~3,500 examples (coding + conversation) |
| Hardware | NVIDIA RTX 3060 12GB |
| Training time | ~53 minutes |
| Precision | bfloat16 |
Benchmarks
Tested on an 8-question eval suite covering identity, coding, reasoning, debugging, conversation, and instruction following:
| Model | Score | Speed |
|---|---|---|
| ATLES-1.5B (this model) | Strong | 52 tok/s |
| Qwen2.5-Coder-1.5B-Instruct (base) | Baseline | 50 tok/s |
Limitations
- 1.5B parameters means limited capacity for complex multi-step reasoning
- Can occasionally hallucinate facts
- Best suited for coding assistance and technical conversation
- English-focused
License
Apache 2.0 (following the Qwen2.5 license)
Credits
- Downloads last month
- 2