NovatasticRoScript commited on
Commit
7ae30ec
ยท
verified ยท
1 Parent(s): 4886942

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +88 -148
README.md CHANGED
@@ -1,167 +1,107 @@
1
- **Atomight-V2.1-0.5B-Inference**
2
-
3
- Atomight-V2.1-0.5B-Inference is a compact reasoning-oriented language model developed under the Atomight ecosystem. Built on a Qwen-derived foundation and refined using GRPO-based reinforcement tuning, the model focuses on efficient reasoning, structured outputs, coding capability, and lightweight deployment.
4
-
5
- Despite its small ~0.5B parameter footprint, Atomight-V2.1 demonstrates competitive performance against other small language models across reasoning and commonsense benchmarks.
6
-
7
  ---
8
-
9
- Overview
10
-
11
- - Model Name: **Atomight-V2.1-0.5B-Inference**
12
- - Parameters: ~494M
13
- - Architecture Base: Qwen-derived causal language model
14
- - Training Method: GRPO reinforcement training
15
- - Primary Focus:
16
- - Reasoning
17
- - Lightweight inference
18
- - Coding capability
19
- - Structured responses
20
- - Efficient deployment
21
-
 
 
 
 
 
 
 
22
  ---
23
 
24
- Training Datasets
25
 
26
- Atomight-V2.1 was trained using a curated mix of public reasoning and instruction datasets, including:
 
 
27
 
28
- - GSM8K (2000 samples)
29
- - HumanEval
30
- - MMLU (2000 samples)
31
- - ARC-Challenge (AI2 ARC)
32
- - Bespoke-Stratos-17k (4000 curated samples)
33
 
34
- The training philosophy emphasized:
35
 
36
- - high-signal reasoning samples,
37
- - compact capability transfer,
38
- - and reinforcement-based refinement over massive-scale brute-force training.
 
39
 
40
  ---
41
 
42
- Benchmark Results
43
 
44
- **Official Evaluation** performed using **EleutherAI LM Evaluation Harness**.
45
 
46
- Benchmark| Score
47
- *ARC-Easy*| **59.3%**
48
- *HellaSwag*| **52.4%**
49
- *ARC-Challenge*| **33.8%**
50
- *GSM8K (Flexible Extract)*| **32.5%**
51
- *GSM8K (Strict)*| **19.8%**
 
 
52
 
53
- Comparative Notes
54
 
55
- Compared against similarly-sized small language models:
56
-
57
- - Competitive with **Qwen2.5-0.5B-Instruct**
58
- - Competitive with **Llama-3.2-1B-Instruct** on selected reasoning benchmarks
59
- - Strongest performance observed in:
60
- - commonsense reasoning,
61
- - structured inference,
62
- - and challenge-style QA
63
 
64
  ---
65
 
66
- Example
67
-
68
- def is_palindrome(string: str) -> bool:
69
- """Returns True if the string reads the same backward as forward, ignoring case."""
70
-
71
- cleaned_string = ''.join(
72
- char.lower() for char in string
73
- if char.isalnum()
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
74
  )
75
 
76
- return cleaned_string == cleaned_string[::-1]
77
-
78
- ---
79
-
80
- Intended Use
81
-
82
- Atomight-V2.1 is designed for:
83
-
84
- - Lightweight local inference
85
- - Experimental reasoning systems
86
- - Educational AI research
87
- - Small-scale coding assistants
88
- - Mobile/cloud deployment workflows
89
- - Efficient fine-tuning experiments
90
-
91
- ---
92
-
93
- Limitations
94
-
95
- This is still a compact 0.5B-scale language model and has several limitations:
96
-
97
- - Weakness in advanced multi-step arithmetic
98
- - Inconsistent scientific reasoning on harder benchmarks
99
- - Occasional verbose reasoning outputs
100
- - Hallucinations remain possible
101
- - Not suitable for high-stakes applications
102
-
103
- ---
104
-
105
- Future Roadmap
106
-
107
- Planned future Atomight developments include:
108
-
109
- - Improved tokenizer optimization
110
- - Specialist teacher-model distillation
111
- - UltraMath / UltraCode / UltraThink training branches
112
- - Hybrid SFT + GRPO pipelines
113
- - Enhanced reasoning alignment
114
- - Lightweight deployment optimization
115
-
116
- ---
117
-
118
- Hardware & Workflow
119
-
120
- Atomight models are developed using a lightweight mobile-first workflow involving:
121
-
122
- - Google Colab
123
- - Kaggle
124
- - Hugging Face ecosystem tooling
125
-
126
- This project explores how far compact open models can be pushed under constrained compute environments.
127
-
128
- ---
129
-
130
- License
131
-
132
- Please refer to the base model license and dataset licenses before commercial or derivative use.
133
-
134
- ---
135
-
136
- Acknowledgements
137
-
138
- Special thanks to:
139
-
140
- - Qwen
141
- - DeepSeek
142
- - Hugging Face
143
- - EleutherAI
144
- - Open-source AI research community
145
-
146
- ---
147
-
148
- Atomight Ecosystem
149
-
150
- Current and planned projects include:
151
-
152
- - Atomight-V2.x
153
- - Atomight UltraMath
154
- - Atomight UltraCode
155
- - Atomight UltraThink
156
- - AtomightDepict-0.4B-Pixels
157
-
158
- ---
159
-
160
- Citation
161
-
162
- @misc{atomight_v21,
163
- title={Atomight-V2.1-0.5B-Inference},
164
- author={NovatasticRoScript},
165
- year={2026},
166
- publisher={Hugging Face}
167
- }
 
 
 
 
 
 
 
1
  ---
2
+ license: apache-2.0
3
+ base_model: Qwen/Qwen2.5-0.5B
4
+ tags:
5
+ - text-generation
6
+ - causal-lm
7
+ - grpo
8
+ - reasoning
9
+ - reinforcement-learning
10
+ - mini-llm
11
+ datasets:
12
+ - openai/gsm8k
13
+ - openai/openai_humaneval
14
+ - cais/mmlu
15
+ - allenai/ai2_arc
16
+ - alignment-handbook/bespoke-stratos-17k
17
+ language:
18
+ - en
19
+ pipeline_tag: text-generation
20
+ metrics:
21
+ - accuracy
22
+ - exact_match
23
  ---
24
 
25
+ # Atomight-V2.1-0.5B-Inference
26
 
27
+ <p align="center">
28
+ <img src="official_radar_benchmark.png" alt="Atomight Footprint" width="500" style="max-width: 100%;">
29
+ </p>
30
 
31
+ **Atomight-V2.1-0.5B-Inference** is an ultra-compact, reasoning-oriented causal language model developed under the **Atomight Ecosystem**. Built on a Qwen-derived 494M parameter foundation, the model has been refined using **GRPO (Group Relative Policy Optimization)** reinforcement tuning.
 
 
 
 
32
 
33
+ Despite its tiny physical footprint, Atomight-V2.1-0.5B targets highly efficient edge-device reasoning, structured text outputs, lightweight coding assistance, and rapid deployment workflows under severe compute constraints.
34
 
35
+ ### ๐Ÿš€ Key Highlights
36
+ - **Parameter Footprint:** ~494M parameters (Loads into ~1GB VRAM at FP16).
37
+ - **Training Paradigm:** GRPO reinforcement learning focusing on high-signal reasoning vectors instead of brute-force dataset scale.
38
+ - **Edge-Optimized:** Designed specifically for low-overhead mobile, local, and browser-based inference loops (Google Colab / Kaggle native workflow).
39
 
40
  ---
41
 
42
+ ## ๐Ÿ“Š Evaluation & Benchmark Results
43
 
44
+ Official evaluations were conducted using the **EleutherAI LM Evaluation Harness** at FP16 precision.
45
 
46
+ ### Core Evaluation Metrics
47
+ | Benchmark Task | Metric Typology | Atomight-V2.1-0.5B Score | Focus Domain |
48
+ | :--- | :--- | :--- | :--- |
49
+ | **ARC-Easy** | Accuracy (Normalized) | **59.34%** | Scientific Fact Retrieval |
50
+ | **HellaSwag** | Accuracy (Normalized) | **52.35%** | Commonsense Reasoning & Next-Sentence Prediction |
51
+ | **ARC-Challenge** | Accuracy (Normalized) | **33.79%** | Hard Analytical Exclusion Logic |
52
+ | **GSM8K (Flexible Extract)** | Exact Match (Regex Clean) | **32.45%** | Mathematical Thought & Resolution |
53
+ | **GSM8K (Strict)** | Exact Match (Rigid Parse) | **19.79%** | Formatted Mathematical Output |
54
 
55
+ ### ๐Ÿ” Comparative Engineering Insights
56
 
57
+ * **Punching Above Weight Classes:** Atomight-V2.1-0.5B outpaces Meta's larger **Llama-3.2-1B-Instruct** on localized logic-retrieval metrics, clearing **59.3%** on ARC-Easy and **33.8%** on ARC-Challenge compared to Llama's *56.7%* and *31.8%* respectively.
58
+ * **The Reasoning Gap:** On mathematical reasoning (GSM8K), when evaluated with **Flexible Extraction parsing (32.45%)**, Atomight demonstrates higher raw mathematical accuracy than both Qwen2.5-0.5B-Instruct (*26.8%*) and Llama-3.2-1B-Instruct (*24.4%*).
59
+ * **The Formatting Note:** The delta between Atomight's Strict Math score (19.8%) and Flexible Math score (32.5%) stems from the internal reasoning tokens generated during the inference step. While the mathematical conclusion is correct nearly 1/3 of the time, the model frequently bypasses rigid formatting constraints in favor of dense thinking traces.
 
 
 
 
 
60
 
61
  ---
62
 
63
+ ## ๐Ÿ’ป Quickstart: Inference Execution
64
+
65
+ Atomight utilizes system and sequence prompts to partition thinking spaces. For optimal reasoning convergence, use explicit `<thinking>` and `<answer>` encapsulation layers.
66
+
67
+ ```python
68
+ import torch
69
+ from transformers import AutoModelForCausalLM, AutoTokenizer
70
+
71
+ model_id = "NovatasticRoScript/Atomight-V2.1-0.5B-Inference"
72
+
73
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
74
+ model = AutoModelForCausalLM.from_pretrained(
75
+ model_id,
76
+ torch_dtype=torch.float16,
77
+ device_map="auto"
78
+ )
79
+
80
+ # Structuring system guidelines for GRPO activation
81
+ messages = [
82
+ {
83
+ "role": "system",
84
+ "content": "You are a reasoning model. Think inside <thinking> and answer inside <answer>."
85
+ },
86
+ {
87
+ "role": "user",
88
+ "content": "A farmer has 12 apples. He gives 4 to his neighbor and loses 2 on the way home. How many apples does he have left?"
89
+ }
90
+ ]
91
+
92
+ inputs = tokenizer.apply_chat_template(
93
+ messages,
94
+ tokenize=True,
95
+ add_generation_prompt=True,
96
+ return_tensors="pt"
97
+ ).to("cuda")
98
+
99
+ with torch.no_grad():
100
+ outputs = model.generate(
101
+ inputs,
102
+ max_new_tokens=250,
103
+ temperature=0.01,
104
+ pad_token_id=tokenizer.eos_token_id
105
  )
106
 
107
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))