NovatasticRoScript commited on
Commit
4886942
·
verified ·
1 Parent(s): 91cde09

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +149 -55
README.md CHANGED
@@ -1,73 +1,167 @@
1
- ---
2
- base_model: unsloth/qwen2.5-0.5b-instruct-unsloth-bnb-4bit
3
- library_name: transformers
4
- model_name: results
5
- tags:
6
- - generated_from_trainer
7
- - trl
8
- - grpo
9
- - unsloth
10
- licence: license
11
- license: mit
12
- datasets:
13
- - bespokelabs/Bespoke-Stratos-17k
14
- language:
15
- - en
16
  ---
17
 
18
- # Model Card for results
19
 
20
- This model is a fine-tuned version of [unsloth/qwen2.5-0.5b-instruct-unsloth-bnb-4bit](https://huggingface.co/unsloth/qwen2.5-0.5b-instruct-unsloth-bnb-4bit).
21
- It has been trained using [TRL](https://github.com/huggingface/trl).
 
 
 
 
 
 
 
 
22
 
23
- ## Quick start
24
 
25
- ```python
26
- from transformers import pipeline
27
 
28
- question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
29
- generator = pipeline("text-generation", model="NovatasticRoScript/results", device="cuda")
30
- output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
31
- print(output["generated_text"])
32
- ```
33
 
34
- ## Training procedure
 
 
 
 
35
 
36
-
37
 
 
 
 
38
 
 
39
 
40
- This model was trained with GRPO, a method introduced in [DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models](https://huggingface.co/papers/2402.03300).
41
 
42
- ### Framework versions
43
 
44
- - TRL: 1.5.0
45
- - Transformers: 5.9.0
46
- - Pytorch: 2.10.0
47
- - Datasets: 4.8.5
48
- - Tokenizers: 0.22.2
 
49
 
50
- ## Citations
51
 
52
- Cite GRPO as:
53
 
54
- ```bibtex
55
- @article{shao2024deepseekmath,
56
- title = {{DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models}},
57
- author = {Zhihong Shao and Peiyi Wang and Qihao Zhu and Runxin Xu and Junxiao Song and Mingchuan Zhang and Y. K. Li and Y. Wu and Daya Guo},
58
- year = 2024,
59
- eprint = {arXiv:2402.03300},
60
- }
61
- ```
62
 
63
- Cite TRL as:
 
 
 
 
 
64
 
65
- ```bibtex
66
- @software{vonwerra2020trl,
67
- title = {{TRL: Transformers Reinforcement Learning}},
68
- author = {von Werra, Leandro and Belkada, Younes and Tunstall, Lewis and Beeching, Edward and Thrush, Tristan and Lambert, Nathan and Huang, Shengyi and Rasul, Kashif and Gallouédec, Quentin},
69
- license = {Apache-2.0},
70
- url = {https://github.com/huggingface/trl},
71
- year = {2020}
72
- }
73
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ **Atomight-V2.1-0.5B-Inference**
2
+
3
+ Atomight-V2.1-0.5B-Inference is a compact reasoning-oriented language model developed under the Atomight ecosystem. Built on a Qwen-derived foundation and refined using GRPO-based reinforcement tuning, the model focuses on efficient reasoning, structured outputs, coding capability, and lightweight deployment.
4
+
5
+ Despite its small ~0.5B parameter footprint, Atomight-V2.1 demonstrates competitive performance against other small language models across reasoning and commonsense benchmarks.
6
+
 
 
 
 
 
 
 
 
 
7
  ---
8
 
9
+ Overview
10
 
11
+ - Model Name: **Atomight-V2.1-0.5B-Inference**
12
+ - Parameters: ~494M
13
+ - Architecture Base: Qwen-derived causal language model
14
+ - Training Method: GRPO reinforcement training
15
+ - Primary Focus:
16
+ - Reasoning
17
+ - Lightweight inference
18
+ - Coding capability
19
+ - Structured responses
20
+ - Efficient deployment
21
 
22
+ ---
23
 
24
+ Training Datasets
 
25
 
26
+ Atomight-V2.1 was trained using a curated mix of public reasoning and instruction datasets, including:
 
 
 
 
27
 
28
+ - GSM8K (2000 samples)
29
+ - HumanEval
30
+ - MMLU (2000 samples)
31
+ - ARC-Challenge (AI2 ARC)
32
+ - Bespoke-Stratos-17k (4000 curated samples)
33
 
34
+ The training philosophy emphasized:
35
 
36
+ - high-signal reasoning samples,
37
+ - compact capability transfer,
38
+ - and reinforcement-based refinement over massive-scale brute-force training.
39
 
40
+ ---
41
 
42
+ Benchmark Results
43
 
44
+ **Official Evaluation** performed using **EleutherAI LM Evaluation Harness**.
45
 
46
+ Benchmark| Score
47
+ *ARC-Easy*| **59.3%**
48
+ *HellaSwag*| **52.4%**
49
+ *ARC-Challenge*| **33.8%**
50
+ *GSM8K (Flexible Extract)*| **32.5%**
51
+ *GSM8K (Strict)*| **19.8%**
52
 
53
+ Comparative Notes
54
 
55
+ Compared against similarly-sized small language models:
56
 
57
+ - Competitive with **Qwen2.5-0.5B-Instruct**
58
+ - Competitive with **Llama-3.2-1B-Instruct** on selected reasoning benchmarks
59
+ - Strongest performance observed in:
60
+ - commonsense reasoning,
61
+ - structured inference,
62
+ - and challenge-style QA
 
 
63
 
64
+ ---
65
+
66
+ Example
67
+
68
+ def is_palindrome(string: str) -> bool:
69
+ """Returns True if the string reads the same backward as forward, ignoring case."""
70
 
71
+ cleaned_string = ''.join(
72
+ char.lower() for char in string
73
+ if char.isalnum()
74
+ )
75
+
76
+ return cleaned_string == cleaned_string[::-1]
77
+
78
+ ---
79
+
80
+ Intended Use
81
+
82
+ Atomight-V2.1 is designed for:
83
+
84
+ - Lightweight local inference
85
+ - Experimental reasoning systems
86
+ - Educational AI research
87
+ - Small-scale coding assistants
88
+ - Mobile/cloud deployment workflows
89
+ - Efficient fine-tuning experiments
90
+
91
+ ---
92
+
93
+ Limitations
94
+
95
+ This is still a compact 0.5B-scale language model and has several limitations:
96
+
97
+ - Weakness in advanced multi-step arithmetic
98
+ - Inconsistent scientific reasoning on harder benchmarks
99
+ - Occasional verbose reasoning outputs
100
+ - Hallucinations remain possible
101
+ - Not suitable for high-stakes applications
102
+
103
+ ---
104
+
105
+ Future Roadmap
106
+
107
+ Planned future Atomight developments include:
108
+
109
+ - Improved tokenizer optimization
110
+ - Specialist teacher-model distillation
111
+ - UltraMath / UltraCode / UltraThink training branches
112
+ - Hybrid SFT + GRPO pipelines
113
+ - Enhanced reasoning alignment
114
+ - Lightweight deployment optimization
115
+
116
+ ---
117
+
118
+ Hardware & Workflow
119
+
120
+ Atomight models are developed using a lightweight mobile-first workflow involving:
121
+
122
+ - Google Colab
123
+ - Kaggle
124
+ - Hugging Face ecosystem tooling
125
+
126
+ This project explores how far compact open models can be pushed under constrained compute environments.
127
+
128
+ ---
129
+
130
+ License
131
+
132
+ Please refer to the base model license and dataset licenses before commercial or derivative use.
133
+
134
+ ---
135
+
136
+ Acknowledgements
137
+
138
+ Special thanks to:
139
+
140
+ - Qwen
141
+ - DeepSeek
142
+ - Hugging Face
143
+ - EleutherAI
144
+ - Open-source AI research community
145
+
146
+ ---
147
+
148
+ Atomight Ecosystem
149
+
150
+ Current and planned projects include:
151
+
152
+ - Atomight-V2.x
153
+ - Atomight UltraMath
154
+ - Atomight UltraCode
155
+ - Atomight UltraThink
156
+ - AtomightDepict-0.4B-Pixels
157
+
158
+ ---
159
+
160
+ Citation
161
+
162
+ @misc{atomight_v21,
163
+ title={Atomight-V2.1-0.5B-Inference},
164
+ author={NovatasticRoScript},
165
+ year={2026},
166
+ publisher={Hugging Face}
167
+ }