Atomight-V2.1-0.5B-Inference

Atomight Logo

Atomight-V2.1-0.5B-Inference is an ultra-compact, reasoning-oriented causal language model developed under the Atomight Ecosystem. Built on a Qwen-derived 494M parameter foundation, the model has been refined using GRPO (Group Relative Policy Optimization) reinforcement tuning.

Despite its tiny physical footprint, Atomight-V2.1-0.5B targets highly efficient edge-device reasoning, structured text outputs, lightweight coding assistance, and rapid deployment workflows under severe compute constraints.

🚀 Key Highlights

Parameter Footprint: ~494M parameters (Loads into ~1GB VRAM at FP16).
Training Paradigm: GRPO reinforcement learning focusing on high-signal reasoning vectors instead of brute-force dataset scale.
Edge-Optimized: Designed specifically for low-overhead mobile, local, and browser-based inference loops (Google Colab / Kaggle native workflow).

📊 Evaluation & Benchmark Results

Official evaluations were conducted using the EleutherAI LM Evaluation Harness at FP16 precision.

Core Evaluation Metrics

Benchmark Task	Metric Typology	Atomight-V2.1-0.5B Score	Focus Domain
ARC-Easy	Accuracy (Normalized)	59.34%	Scientific Fact Retrieval
HellaSwag	Accuracy (Normalized)	52.35%	Commonsense Reasoning & Next-Sentence Prediction
ARC-Challenge	Accuracy (Normalized)	33.79%	Hard Analytical Exclusion Logic
GSM8K (Flexible Extract)	Exact Match (Regex Clean)	32.45%	Mathematical Thought & Resolution
GSM8K (Strict)	Exact Match (Rigid Parse)	19.79%	Formatted Mathematical Output

Atomight V2.1 Benchmark

🔍 Comparative Engineering Insights

Punching Above Weight Classes: Atomight-V2.1-0.5B outpaces Meta's larger Llama-3.2-1B-Instruct on localized logic-retrieval metrics, clearing 59.3% on ARC-Easy and 33.8% on ARC-Challenge compared to Llama's 56.7% and 31.8% respectively.
The Reasoning Gap: On mathematical reasoning (GSM8K), when evaluated with Flexible Extraction parsing (32.45%), Atomight demonstrates higher raw mathematical accuracy than both Qwen2.5-0.5B-Instruct (26.8%) and Llama-3.2-1B-Instruct (24.4%).
The Formatting Note: The delta between Atomight's Strict Math score (19.8%) and Flexible Math score (32.5%) stems from the internal reasoning tokens generated during the inference step. While the mathematical conclusion is correct nearly 1/3 of the time, the model frequently bypasses rigid formatting constraints in favor of dense thinking traces.

💻 Quickstart: Inference Execution

Atomight utilizes system and sequence prompts to partition thinking spaces. For optimal reasoning convergence, use explicit <thinking> and <answer> encapsulation layers.

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "NovatasticRoScript/Atomight-V2.1-0.5B-Inference"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id, 
    torch_dtype=torch.float16, 
    device_map="auto"
)

# Structuring system guidelines for GRPO activation
messages = [
    {
        "role": "system", 
        "content": "You are a reasoning model. Think inside <thinking> and answer inside <answer>."
    },
    {
        "role": "user", 
        "content": "A farmer has 12 apples. He gives 4 to his neighbor and loses 2 on the way home. How many apples does he have left?"
    }
]

inputs = tokenizer.apply_chat_template(
    messages, 
    tokenize=True, 
    add_generation_prompt=True, 
    return_tensors="pt"
).to("cuda")

with torch.no_grad():
    outputs = model.generate(
        inputs, 
        max_new_tokens=250, 
        temperature=0.01,
        pad_token_id=tokenizer.eos_token_id
    )

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Downloads last month: -

Safetensors

Model size

0.5B params

Tensor type

BF16

Model tree for NovatasticRoScript/Atomight-V2.1-0.5B-Inference

Base model

Qwen/Qwen2.5-0.5B

Finetuned

(628)

this model

Quantizations

1 model

NovatasticRoScript
/

Atomight-V2.1-0.5B-Inference