NovatasticRoScript
/

Atomight-V2.1-0.5B-Inference

@@ -1,167 +1,107 @@
-**Atomight-V2.1-0.5B-Inference**
-Atomight-V2.1-0.5B-Inference is a compact reasoning-oriented language model developed under the Atomight ecosystem. Built on a Qwen-derived foundation and refined using GRPO-based reinforcement tuning, the model focuses on efficient reasoning, structured outputs, coding capability, and lightweight deployment.
-Despite its small ~0.5B parameter footprint, Atomight-V2.1 demonstrates competitive performance against other small language models across reasoning and commonsense benchmarks.
 ---
-Overview
-- Model Name: **Atomight-V2.1-0.5B-Inference**
-- Parameters: ~494M
-- Architecture Base: Qwen-derived causal language model
-- Training Method: GRPO reinforcement training
-- Primary Focus:
-  - Reasoning
-  - Lightweight inference
-  - Coding capability
-  - Structured responses
-  - Efficient deployment
 ---
-Training Datasets
-Atomight-V2.1 was trained using a curated mix of public reasoning and instruction datasets, including:
-- GSM8K (2000 samples)
-- HumanEval
-- MMLU (2000 samples)
-- ARC-Challenge (AI2 ARC)
-- Bespoke-Stratos-17k (4000 curated samples)
-The training philosophy emphasized:
-- high-signal reasoning samples,
-- compact capability transfer,
-- and reinforcement-based refinement over massive-scale brute-force training.
 ---
-Benchmark Results
-**Official Evaluation** performed using **EleutherAI LM Evaluation Harness**.
-Benchmark| Score
-*ARC-Easy*| **59.3%**
-*HellaSwag*| **52.4%**
-*ARC-Challenge*| **33.8%**
-*GSM8K (Flexible Extract)*| **32.5%**
-*GSM8K (Strict)*| **19.8%**
-Comparative Notes
-Compared against similarly-sized small language models:
-- Competitive with **Qwen2.5-0.5B-Instruct**
-- Competitive with **Llama-3.2-1B-Instruct** on selected reasoning benchmarks
-- Strongest performance observed in:
-  - commonsense reasoning,
-  - structured inference,
-  - and challenge-style QA
 ---
-Example
-def is_palindrome(string: str) -> bool:
-    """Returns True if the string reads the same backward as forward, ignoring case."""
-    cleaned_string = ''.join(
-        char.lower() for char in string
-        if char.isalnum()
     )
-    return cleaned_string == cleaned_string[::-1]
----
-Intended Use
-Atomight-V2.1 is designed for:
-- Lightweight local inference
-- Experimental reasoning systems
-- Educational AI research
-- Small-scale coding assistants
-- Mobile/cloud deployment workflows
-- Efficient fine-tuning experiments
----
-Limitations
-This is still a compact 0.5B-scale language model and has several limitations:
-- Weakness in advanced multi-step arithmetic
-- Inconsistent scientific reasoning on harder benchmarks
-- Occasional verbose reasoning outputs
-- Hallucinations remain possible
-- Not suitable for high-stakes applications
----
-Future Roadmap
-Planned future Atomight developments include:
-- Improved tokenizer optimization
-- Specialist teacher-model distillation
-- UltraMath / UltraCode / UltraThink training branches
-- Hybrid SFT + GRPO pipelines
-- Enhanced reasoning alignment
-- Lightweight deployment optimization
----
-Hardware & Workflow
-Atomight models are developed using a lightweight mobile-first workflow involving:
-- Google Colab
-- Kaggle
-- Hugging Face ecosystem tooling
-This project explores how far compact open models can be pushed under constrained compute environments.
----
-License
-Please refer to the base model license and dataset licenses before commercial or derivative use.
----
-Acknowledgements
-Special thanks to:
-- Qwen
-- DeepSeek
-- Hugging Face
-- EleutherAI
-- Open-source AI research community
----
-Atomight Ecosystem
-Current and planned projects include:
-- Atomight-V2.x
-- Atomight UltraMath
-- Atomight UltraCode
-- Atomight UltraThink
-- AtomightDepict-0.4B-Pixels
----
-Citation
-@misc{atomight_v21,
-  title={Atomight-V2.1-0.5B-Inference},
-  author={NovatasticRoScript},
-  year={2026},
-  publisher={Hugging Face}
-}

 ---
+license: apache-2.0
+base_model: Qwen/Qwen2.5-0.5B
+tags:
+- text-generation
+- causal-lm
+- grpo
+- reasoning
+- reinforcement-learning
+- mini-llm
+datasets:
+- openai/gsm8k
+- openai/openai_humaneval
+- cais/mmlu
+- allenai/ai2_arc
+- alignment-handbook/bespoke-stratos-17k
+language:
+- en
+pipeline_tag: text-generation
+metrics:
+- accuracy
+- exact_match
 ---
+# Atomight-V2.1-0.5B-Inference
+<p align="center">
+  <img src="official_radar_benchmark.png" alt="Atomight Footprint" width="500" style="max-width: 100%;">
+</p>
+**Atomight-V2.1-0.5B-Inference** is an ultra-compact, reasoning-oriented causal language model developed under the **Atomight Ecosystem**. Built on a Qwen-derived 494M parameter foundation, the model has been refined using **GRPO (Group Relative Policy Optimization)** reinforcement tuning.
+Despite its tiny physical footprint, Atomight-V2.1-0.5B targets highly efficient edge-device reasoning, structured text outputs, lightweight coding assistance, and rapid deployment workflows under severe compute constraints.
+### 🚀 Key Highlights
+- **Parameter Footprint:** ~494M parameters (Loads into ~1GB VRAM at FP16).
+- **Training Paradigm:** GRPO reinforcement learning focusing on high-signal reasoning vectors instead of brute-force dataset scale.
+- **Edge-Optimized:** Designed specifically for low-overhead mobile, local, and browser-based inference loops (Google Colab / Kaggle native workflow).
 ---
+## 📊 Evaluation & Benchmark Results
+Official evaluations were conducted using the **EleutherAI LM Evaluation Harness** at FP16 precision.
+### Core Evaluation Metrics
+| Benchmark Task | Metric Typology | Atomight-V2.1-0.5B Score | Focus Domain |
+| :--- | :--- | :--- | :--- |
+| **ARC-Easy** | Accuracy (Normalized) | **59.34%** | Scientific Fact Retrieval |
+| **HellaSwag** | Accuracy (Normalized) | **52.35%** | Commonsense Reasoning & Next-Sentence Prediction |
+| **ARC-Challenge** | Accuracy (Normalized) | **33.79%** | Hard Analytical Exclusion Logic |
+| **GSM8K (Flexible Extract)** | Exact Match (Regex Clean) | **32.45%** | Mathematical Thought & Resolution |
+| **GSM8K (Strict)** | Exact Match (Rigid Parse) | **19.79%** | Formatted Mathematical Output |
+### 🔍 Comparative Engineering Insights
+* **Punching Above Weight Classes:** Atomight-V2.1-0.5B outpaces Meta's larger **Llama-3.2-1B-Instruct** on localized logic-retrieval metrics, clearing **59.3%** on ARC-Easy and **33.8%** on ARC-Challenge compared to Llama's *56.7%* and *31.8%* respectively.
+* **The Reasoning Gap:** On mathematical reasoning (GSM8K), when evaluated with **Flexible Extraction parsing (32.45%)**, Atomight demonstrates higher raw mathematical accuracy than both Qwen2.5-0.5B-Instruct (*26.8%*) and Llama-3.2-1B-Instruct (*24.4%*).
+* **The Formatting Note:** The delta between Atomight's Strict Math score (19.8%) and Flexible Math score (32.5%) stems from the internal reasoning tokens generated during the inference step. While the mathematical conclusion is correct nearly 1/3 of the time, the model frequently bypasses rigid formatting constraints in favor of dense thinking traces.
 ---
+## 💻 Quickstart: Inference Execution
+Atomight utilizes system and sequence prompts to partition thinking spaces. For optimal reasoning convergence, use explicit `<thinking>` and `<answer>` encapsulation layers.
+```python
+import torch
+from transformers import AutoModelForCausalLM, AutoTokenizer
+model_id = "NovatasticRoScript/Atomight-V2.1-0.5B-Inference"
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+model = AutoModelForCausalLM.from_pretrained(
+    model_id,
+    torch_dtype=torch.float16,
+    device_map="auto"
+)
+# Structuring system guidelines for GRPO activation
+messages = [
+    {
+        "role": "system",
+        "content": "You are a reasoning model. Think inside <thinking> and answer inside <answer>."
+    },
+    {
+        "role": "user",
+        "content": "A farmer has 12 apples. He gives 4 to his neighbor and loses 2 on the way home. How many apples does he have left?"
+    }
+]
+inputs = tokenizer.apply_chat_template(
+    messages,
+    tokenize=True,
+    add_generation_prompt=True,
+    return_tensors="pt"
+).to("cuda")
+with torch.no_grad():
+    outputs = model.generate(
+        inputs,
+        max_new_tokens=250,
+        temperature=0.01,
+        pad_token_id=tokenizer.eos_token_id
     )
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))