Atomight-V2.1-0.5B-Inference
Atomight-V2.1-0.5B-Inference is an ultra-compact, reasoning-oriented causal language model developed under the Atomight Ecosystem. Built on a Qwen-derived 494M parameter foundation, the model has been refined using GRPO (Group Relative Policy Optimization) reinforcement tuning.
Despite its tiny physical footprint, Atomight-V2.1-0.5B targets highly efficient edge-device reasoning, structured text outputs, lightweight coding assistance, and rapid deployment workflows under severe compute constraints.
🚀 Key Highlights
- Parameter Footprint: ~494M parameters (Loads into ~1GB VRAM at FP16).
- Training Paradigm: GRPO reinforcement learning focusing on high-signal reasoning vectors instead of brute-force dataset scale.
- Edge-Optimized: Designed specifically for low-overhead mobile, local, and browser-based inference loops (Google Colab / Kaggle native workflow).
📊 Evaluation & Benchmark Results
Official evaluations were conducted using the EleutherAI LM Evaluation Harness at FP16 precision.
Core Evaluation Metrics
| Benchmark Task | Metric Typology | Atomight-V2.1-0.5B Score | Focus Domain |
|---|---|---|---|
| ARC-Easy | Accuracy (Normalized) | 59.34% | Scientific Fact Retrieval |
| HellaSwag | Accuracy (Normalized) | 52.35% | Commonsense Reasoning & Next-Sentence Prediction |
| ARC-Challenge | Accuracy (Normalized) | 33.79% | Hard Analytical Exclusion Logic |
| GSM8K (Flexible Extract) | Exact Match (Regex Clean) | 32.45% | Mathematical Thought & Resolution |
| GSM8K (Strict) | Exact Match (Rigid Parse) | 19.79% | Formatted Mathematical Output |
🔍 Comparative Engineering Insights
- Punching Above Weight Classes: Atomight-V2.1-0.5B outpaces Meta's larger Llama-3.2-1B-Instruct on localized logic-retrieval metrics, clearing 59.3% on ARC-Easy and 33.8% on ARC-Challenge compared to Llama's 56.7% and 31.8% respectively.
- The Reasoning Gap: On mathematical reasoning (GSM8K), when evaluated with Flexible Extraction parsing (32.45%), Atomight demonstrates higher raw mathematical accuracy than both Qwen2.5-0.5B-Instruct (26.8%) and Llama-3.2-1B-Instruct (24.4%).
- The Formatting Note: The delta between Atomight's Strict Math score (19.8%) and Flexible Math score (32.5%) stems from the internal reasoning tokens generated during the inference step. While the mathematical conclusion is correct nearly 1/3 of the time, the model frequently bypasses rigid formatting constraints in favor of dense thinking traces.
💻 Quickstart: Inference Execution
Atomight utilizes system and sequence prompts to partition thinking spaces. For optimal reasoning convergence, use explicit <thinking> and <answer> encapsulation layers.
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "NovatasticRoScript/Atomight-V2.1-0.5B-Inference"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.float16,
device_map="auto"
)
# Structuring system guidelines for GRPO activation
messages = [
{
"role": "system",
"content": "You are a reasoning model. Think inside <thinking> and answer inside <answer>."
},
{
"role": "user",
"content": "A farmer has 12 apples. He gives 4 to his neighbor and loses 2 on the way home. How many apples does he have left?"
}
]
inputs = tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_tensors="pt"
).to("cuda")
with torch.no_grad():
outputs = model.generate(
inputs,
max_new_tokens=250,
temperature=0.01,
pad_token_id=tokenizer.eos_token_id
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
- Downloads last month
- -