Atomight-V2.1-0.5B-Inference

Atomight Logo

Atomight-V2.1-0.5B-Inference is an ultra-compact, reasoning-oriented causal language model developed under the Atomight Ecosystem. Built on a Qwen-derived 494M parameter foundation, the model has been refined using GRPO (Group Relative Policy Optimization) reinforcement tuning.

Despite its tiny physical footprint, Atomight-V2.1-0.5B targets highly efficient edge-device reasoning, structured text outputs, lightweight coding assistance, and rapid deployment workflows under severe compute constraints.

🚀 Key Highlights

  • Parameter Footprint: ~494M parameters (Loads into ~1GB VRAM at FP16).
  • Training Paradigm: GRPO reinforcement learning focusing on high-signal reasoning vectors instead of brute-force dataset scale.
  • Edge-Optimized: Designed specifically for low-overhead mobile, local, and browser-based inference loops (Google Colab / Kaggle native workflow).

📊 Evaluation & Benchmark Results

Official evaluations were conducted using the EleutherAI LM Evaluation Harness at FP16 precision.

Core Evaluation Metrics

Benchmark Task Metric Typology Atomight-V2.1-0.5B Score Focus Domain
ARC-Easy Accuracy (Normalized) 59.34% Scientific Fact Retrieval
HellaSwag Accuracy (Normalized) 52.35% Commonsense Reasoning & Next-Sentence Prediction
ARC-Challenge Accuracy (Normalized) 33.79% Hard Analytical Exclusion Logic
GSM8K (Flexible Extract) Exact Match (Regex Clean) 32.45% Mathematical Thought & Resolution
GSM8K (Strict) Exact Match (Rigid Parse) 19.79% Formatted Mathematical Output

Atomight V2.1 Benchmark

🔍 Comparative Engineering Insights

  • Punching Above Weight Classes: Atomight-V2.1-0.5B outpaces Meta's larger Llama-3.2-1B-Instruct on localized logic-retrieval metrics, clearing 59.3% on ARC-Easy and 33.8% on ARC-Challenge compared to Llama's 56.7% and 31.8% respectively.
  • The Reasoning Gap: On mathematical reasoning (GSM8K), when evaluated with Flexible Extraction parsing (32.45%), Atomight demonstrates higher raw mathematical accuracy than both Qwen2.5-0.5B-Instruct (26.8%) and Llama-3.2-1B-Instruct (24.4%).
  • The Formatting Note: The delta between Atomight's Strict Math score (19.8%) and Flexible Math score (32.5%) stems from the internal reasoning tokens generated during the inference step. While the mathematical conclusion is correct nearly 1/3 of the time, the model frequently bypasses rigid formatting constraints in favor of dense thinking traces.

💻 Quickstart: Inference Execution

Atomight utilizes system and sequence prompts to partition thinking spaces. For optimal reasoning convergence, use explicit <thinking> and <answer> encapsulation layers.

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "NovatasticRoScript/Atomight-V2.1-0.5B-Inference"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id, 
    torch_dtype=torch.float16, 
    device_map="auto"
)

# Structuring system guidelines for GRPO activation
messages = [
    {
        "role": "system", 
        "content": "You are a reasoning model. Think inside <thinking> and answer inside <answer>."
    },
    {
        "role": "user", 
        "content": "A farmer has 12 apples. He gives 4 to his neighbor and loses 2 on the way home. How many apples does he have left?"
    }
]

inputs = tokenizer.apply_chat_template(
    messages, 
    tokenize=True, 
    add_generation_prompt=True, 
    return_tensors="pt"
).to("cuda")

with torch.no_grad():
    outputs = model.generate(
        inputs, 
        max_new_tokens=250, 
        temperature=0.01,
        pad_token_id=tokenizer.eos_token_id
    )

print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Downloads last month
-
Safetensors
Model size
0.5B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for NovatasticRoScript/Atomight-V2.1-0.5B-Inference

Finetuned
(628)
this model
Quantizations
1 model

Datasets used to train NovatasticRoScript/Atomight-V2.1-0.5B-Inference