lit69 commited on
Commit
5062f0e
·
verified ·
1 Parent(s): ccb7c82

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +223 -3
README.md CHANGED
@@ -1,3 +1,223 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Model Card for CoreX v0.1
2
+
3
+ This model card documents CoreX v0.1, a lightweight transformer-based language model developed by Nexizan Company. CoreX is optimized for low-memory systems while enabling offline AI assistants, coding tutors, and sandbox research.
4
+
5
+ Model Details
6
+ Model Description
7
+
8
+ Developed by: Nexizan Company
9
+
10
+ Funded by [optional]: Self-funded
11
+
12
+ Shared by [optional]: Nexizan Company CoreX team
13
+
14
+ Model type: Decoder-only Transformer (causal LM)
15
+
16
+ Language(s) (NLP): English
17
+
18
+ License: Apache-2.0
19
+
20
+ Finetuned from model [optional]: Trained from scratch
21
+
22
+ Model Sources [optional]
23
+
24
+ Repository: [To be added]
25
+
26
+ Paper [optional]: N/A
27
+
28
+ Demo [optional]: Local chat interface (chat_interface.py)
29
+
30
+ Uses
31
+ Direct Use
32
+
33
+ Conversational assistant (terminal interface)
34
+
35
+ Text generation and summarization
36
+
37
+ Code and math assistance
38
+
39
+ Educational / research sandbox
40
+
41
+ Downstream Use [optional]
42
+
43
+ Fine-tuning for domain-specific tasks (education, productivity, research)
44
+
45
+ Integration into private offline-first AI platforms (e.g., NexIN)
46
+
47
+ Out-of-Scope Use
48
+
49
+ Medical, legal, or financial decision-making
50
+
51
+ Fully autonomous deployment without human oversight
52
+
53
+ Generating harmful or unsafe content
54
+
55
+ Bias, Risks, and Limitations
56
+
57
+ Trained on ~9.2M tokens → knowledge is limited compared to larger models
58
+
59
+ Performance weaker in non-English languages
60
+
61
+ May reproduce biases from the dataset
62
+
63
+ Can generate hallucinated or incorrect facts
64
+
65
+ Recommendations
66
+
67
+ Always use human oversight for critical applications
68
+
69
+ Apply filtering or moderation layers for safety
70
+
71
+ Fine-tune with curated datasets for better domain performance
72
+
73
+ How to Get Started with the Model
74
+ python chat_interface.py
75
+
76
+
77
+ Or in Python:
78
+
79
+ from transformers import AutoTokenizer, AutoModelForCausalLM
80
+
81
+ tokenizer = AutoTokenizer.from_pretrained("path/to/corex_tok.model")
82
+ model = AutoModelForCausalLM.from_pretrained("path/to/final_model.pt")
83
+
84
+ inputs = tokenizer("Hello CoreX!", return_tensors="pt")
85
+ outputs = model.generate(**inputs, max_new_tokens=50)
86
+ print(tokenizer.decode(outputs[0]))
87
+
88
+ Training Details
89
+ Training Data
90
+
91
+ Samples: 34,559
92
+
93
+ Tokens: ~9.2M
94
+
95
+ Avg length: ~266 tokens
96
+
97
+ Max length: 1024 tokens
98
+
99
+ Tokenizer: SentencePiece unigram, vocab size 32,000
100
+
101
+ Preprocessing [optional]
102
+
103
+ Normalization and whitespace handling
104
+
105
+ Special tokens for <pad>, <unk>, <s>, </s>
106
+
107
+ Training Hyperparameters
108
+
109
+ Training regime: Mixed precision (CPU/GPU optimized)
110
+
111
+ Hidden size: 512
112
+
113
+ Layers: 8
114
+
115
+ Attention heads: 8 (2 key-value heads)
116
+
117
+ Intermediate size: 1365 (SwiGLU)
118
+
119
+ Max position embeddings: 2048
120
+
121
+ Learning rate: 5e-4 (cosine schedule)
122
+
123
+ Optimizer: AdamW (β1=0.9, β2=0.95, wd=0.1)
124
+
125
+ Batch size: 2 (accumulated to 32)
126
+
127
+ Steps: 50,000
128
+
129
+ Speeds, Sizes, Times [optional]
130
+
131
+ Parameters: ~54.8M
132
+
133
+ Checkpoint size: ~220MB
134
+
135
+ Optimized for: ~7GB RAM systems
136
+
137
+ Evaluation
138
+ Testing Data, Factors & Metrics
139
+ Testing Data
140
+
141
+ Evaluation with held-out samples from the same dataset
142
+
143
+ Factors
144
+
145
+ Tested on conversational, code, and math-style prompts
146
+
147
+ Metrics
148
+
149
+ Perplexity (PPL) and training loss
150
+
151
+ Results
152
+
153
+ PPL: decreasing across training (exact final values TBD)
154
+
155
+ Baseline evaluation shows fluent short-text generation
156
+
157
+ Summary
158
+
159
+ CoreX v0.1 demonstrates solid performance for a lightweight model on low-resource hardware but is not competitive with large-scale LLMs.
160
+
161
+ Model Examination [optional]
162
+
163
+ Architecture verified with rotary embeddings, grouped query attention, SwiGLU, and RMSNorm.
164
+
165
+ Environmental Impact
166
+
167
+ Hardware Type: Consumer GPU/CPU
168
+
169
+ Hours used: Few days of training
170
+
171
+ Cloud Provider: None (local)
172
+
173
+ Compute Region: Local system
174
+
175
+ Carbon Emitted: Low (small model size)
176
+
177
+ Technical Specifications [optional]
178
+ Model Architecture and Objective
179
+
180
+ Decoder-only transformer, 8 layers, SwiGLU, GQA, RoPE
181
+
182
+ Compute Infrastructure
183
+
184
+ Hardware: ~7GB RAM device (tested on consumer GPU/CPU)
185
+
186
+ Software: PyTorch, SentencePiece
187
+
188
+ Citation [optional]
189
+
190
+ BibTeX:
191
+
192
+ @misc{corex2025,
193
+ title={CoreX v0.1: Lightweight Transformer Language Model},
194
+ author={Nexizan Company},
195
+ year={2025},
196
+ license={Apache-2.0}
197
+ }
198
+
199
+
200
+ APA:
201
+ Nexizan Company. (2025). CoreX v0.1: Lightweight Transformer Language Model.
202
+
203
+ Glossary [optional]
204
+
205
+ RoPE: Rotary Position Embedding
206
+
207
+ SwiGLU: Swish-Gated Linear Unit
208
+
209
+ RMSNorm: Root Mean Square Normalization
210
+
211
+ GQA: Grouped Query Attention
212
+
213
+ More Information [optional]
214
+
215
+ CoreX is intended as a stepping stone toward future versions with larger parameter counts and better datasets.
216
+
217
+ Model Card Authors [optional]
218
+
219
+ Nexizan Company CoreX Team
220
+
221
+ Model Card Contact
222
+
223
+ N/A