lit69 commited on
Commit
487feec
·
verified ·
1 Parent(s): 20ee890

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +79 -65
README.md CHANGED
@@ -1,80 +1,89 @@
 
 
 
 
 
 
1
  Model Card for CoreX v0.1
2
 
3
- This model card documents CoreX v0.1, a lightweight transformer-based language model developed by Nexizan Company. CoreX is optimized for low-memory systems while enabling offline AI assistants, coding tutors, and sandbox research.
4
 
5
  Model Details
6
  Model Description
7
 
8
  Developed by: Nexizan Company
9
 
10
- Funded by [optional]: Self-funded
11
 
12
- Shared by [optional]: Nexizan Company CoreX team
13
 
14
- Model type: Decoder-only Transformer (causal LM)
15
 
16
- Language(s) (NLP): English
17
 
18
  License: Apache-2.0
19
 
20
- Finetuned from model [optional]: Trained from scratch
21
 
22
- Model Sources [optional]
23
 
24
- Repository: [To be added]
25
 
26
- Paper [optional]: N/A
27
 
28
- Demo [optional]: Local chat interface (chat_interface.py)
29
 
30
  Uses
31
  Direct Use
32
 
33
- Conversational assistant (terminal interface)
34
 
35
  Text generation and summarization
36
 
37
- Code and math assistance
38
 
39
- Educational / research sandbox
40
 
41
- Downstream Use [optional]
42
 
43
- Fine-tuning for domain-specific tasks (education, productivity, research)
44
 
45
- Integration into private offline-first AI platforms (e.g., NexIN)
46
 
47
  Out-of-Scope Use
48
 
49
- Medical, legal, or financial decision-making
50
 
51
- Fully autonomous deployment without human oversight
52
 
53
- Generating harmful or unsafe content
54
 
55
  Bias, Risks, and Limitations
56
 
57
- Trained on ~9.2M tokens → knowledge is limited compared to larger models
58
 
59
- Performance weaker in non-English languages
60
 
61
- May reproduce biases from the dataset
62
 
63
- Can generate hallucinated or incorrect facts
64
 
65
  Recommendations
66
 
67
- Always use human oversight for critical applications
 
 
 
 
68
 
69
- Apply filtering or moderation layers for safety
70
 
71
- Fine-tune with curated datasets for better domain performance
72
 
73
- How to Get Started with the Model
74
  python chat_interface.py
75
 
76
 
77
- Or in Python:
78
 
79
  from transformers import AutoTokenizer, AutoModelForCausalLM
80
 
@@ -94,98 +103,103 @@ Tokens: ~9.2M
94
 
95
  Avg length: ~266 tokens
96
 
97
- Max length: 1024 tokens
98
 
99
- Tokenizer: SentencePiece unigram, vocab size 32,000
100
 
101
- Preprocessing [optional]
102
 
103
- Normalization and whitespace handling
104
 
105
- Special tokens for <pad>, <unk>, <s>, </s>
 
 
106
 
107
  Training Hyperparameters
108
 
109
- Training regime: Mixed precision (CPU/GPU optimized)
110
 
111
  Hidden size: 512
112
 
113
  Layers: 8
114
 
115
- Attention heads: 8 (2 key-value heads)
116
 
117
  Intermediate size: 1365 (SwiGLU)
118
 
119
- Max position embeddings: 2048
120
 
121
- Learning rate: 5e-4 (cosine schedule)
122
 
123
  Optimizer: AdamW (β1=0.9, β2=0.95, wd=0.1)
124
 
125
- Batch size: 2 (accumulated to 32)
126
 
127
  Steps: 50,000
128
 
129
- Speeds, Sizes, Times [optional]
130
 
131
  Parameters: ~54.8M
132
 
133
  Checkpoint size: ~220MB
134
 
135
- Optimized for: ~7GB RAM systems
136
 
137
  Evaluation
138
- Testing Data, Factors & Metrics
139
  Testing Data
140
 
141
- Evaluation with held-out samples from the same dataset
142
 
143
  Factors
144
 
145
- Tested on conversational, code, and math-style prompts
146
 
147
  Metrics
148
 
149
- Perplexity (PPL) and training loss
150
 
151
  Results
152
 
153
- PPL: decreasing across training (exact final values TBD)
154
 
155
- Baseline evaluation shows fluent short-text generation
156
 
157
  Summary
158
 
159
- CoreX v0.1 demonstrates solid performance for a lightweight model on low-resource hardware but is not competitive with large-scale LLMs.
 
 
160
 
161
- Model Examination [optional]
162
 
163
- Architecture verified with rotary embeddings, grouped query attention, SwiGLU, and RMSNorm.
164
 
165
  Environmental Impact
166
 
167
  Hardware Type: Consumer GPU/CPU
168
 
169
- Hours used: Few days of training
170
 
171
  Cloud Provider: None (local)
172
 
173
- Compute Region: Local system
174
-
175
- Carbon Emitted: Low (small model size)
176
 
177
- Technical Specifications [optional]
178
  Model Architecture and Objective
179
 
180
- Decoder-only transformer, 8 layers, SwiGLU, GQA, RoPE
 
 
 
 
181
 
182
  Compute Infrastructure
183
 
184
- Hardware: ~7GB RAM device (tested on consumer GPU/CPU)
185
 
186
  Software: PyTorch, SentencePiece
187
 
188
- Citation [optional]
189
 
190
  BibTeX:
191
 
@@ -198,25 +212,25 @@ BibTeX:
198
 
199
 
200
  APA:
201
- Nexizan Company. (2025). CoreX v0.1: Lightweight Transformer Language Model.
202
 
203
- Glossary [optional]
204
 
205
- RoPE: Rotary Position Embedding
206
 
207
  SwiGLU: Swish-Gated Linear Unit
208
 
209
- RMSNorm: Root Mean Square Normalization
210
 
211
  GQA: Grouped Query Attention
212
 
213
- More Information [optional]
214
 
215
- CoreX is intended as a stepping stone toward future versions with larger parameter counts and better datasets.
216
 
217
- Model Card Authors [optional]
218
 
219
- Nexizan Company CoreX Team
220
 
221
  Model Card Contact
222
 
 
1
+
2
+ license: apache-2.0
3
+ language:
4
+
5
+ en
6
+
7
  Model Card for CoreX v0.1
8
 
9
+ CoreX v0.1 is a lightweight, decoder-only transformer built by Nexizan Company. It is designed to run efficiently on low-resource systems (~7 GB RAM) while supporting offline AI assistants, coding tutors, and sandbox experiments.
10
 
11
  Model Details
12
  Model Description
13
 
14
  Developed by: Nexizan Company
15
 
16
+ Funded by : Self-funded
17
 
18
+ Shared by : Nexizan inc *CoreX team* ( Faisal - *LitRush* )
19
 
20
+ Model type: Causal LM (transformer, decoder-only)
21
 
22
+ Language(s): English
23
 
24
  License: Apache-2.0
25
 
26
+ Finetuned from model : None (trained from scratch)
27
 
28
+ Model Sources
29
 
30
+ Repository: to be added
31
 
32
+ Paper: N/A
33
 
34
+ Demo: Local CLI via chat_interface.py
35
 
36
  Uses
37
  Direct Use
38
 
39
+ Chat-based assistant (offline/terminal)
40
 
41
  Text generation and summarization
42
 
43
+ Code and math Q&A
44
 
45
+ Educational or personal projects
46
 
47
+ Downstream Use
48
 
49
+ Domain-specific fine-tuning (education, productivity, private tools)
50
 
51
+ Integration into offline AI platforms (e.g., NexIN prototype)
52
 
53
  Out-of-Scope Use
54
 
55
+ Medical, financial, or legal advice
56
 
57
+ Safety-critical or autonomous systems
58
 
59
+ Content generation without moderation
60
 
61
  Bias, Risks, and Limitations
62
 
63
+ Limited training size (~9.2M tokens)restricted knowledge
64
 
65
+ Biases from dataset may appear in responses
66
 
67
+ Non-English performance is weak
68
 
69
+ Risk of hallucinations or unsafe generations
70
 
71
  Recommendations
72
 
73
+ Use a moderation/filtering layer in deployment
74
+
75
+ Fine-tune with curated, domain-specific datasets
76
+
77
+ Always keep a human-in-the-loop for sensitive applications
78
 
79
+ How to Get Started
80
 
81
+ Run the interactive chat interface:
82
 
 
83
  python chat_interface.py
84
 
85
 
86
+ Or load directly in Python:
87
 
88
  from transformers import AutoTokenizer, AutoModelForCausalLM
89
 
 
103
 
104
  Avg length: ~266 tokens
105
 
106
+ Max length: 1024
107
 
108
+ Tokenizer: SentencePiece unigram, vocab=32,000
109
 
110
+ Preprocessing
111
 
112
+ Unicode normalization
113
 
114
+ Special tokens (<pad>, <unk>, <s>, </s>)
115
+
116
+ Deduplication and filtering
117
 
118
  Training Hyperparameters
119
 
120
+ Regime: Mixed precision (CPU/GPU optimized)
121
 
122
  Hidden size: 512
123
 
124
  Layers: 8
125
 
126
+ Attention heads: 8 (2 KV heads)
127
 
128
  Intermediate size: 1365 (SwiGLU)
129
 
130
+ Max positions: 2048
131
 
132
+ Learning rate: 5e-4 (cosine decay, warmup 1k steps)
133
 
134
  Optimizer: AdamW (β1=0.9, β2=0.95, wd=0.1)
135
 
136
+ Batch size: 2 (effective 32 with accumulation)
137
 
138
  Steps: 50,000
139
 
140
+ Speeds, Sizes, Times
141
 
142
  Parameters: ~54.8M
143
 
144
  Checkpoint size: ~220MB
145
 
146
+ Hardware target: 7 GB RAM systems
147
 
148
  Evaluation
 
149
  Testing Data
150
 
151
+ Held-out samples from training corpus
152
 
153
  Factors
154
 
155
+ Conversational text, code snippets, math expressions
156
 
157
  Metrics
158
 
159
+ Perplexity (PPL), loss
160
 
161
  Results
162
 
163
+ Training loss decreased steadily
164
 
165
+ Early tests show coherent text and code generation
166
 
167
  Summary
168
 
169
+ CoreX v0.1 achieves usable fluency for small-scale tasks. It is not comparable to large LLMs, but excels at lightweight, private, offline usage.
170
+
171
+ Model Examination
172
 
173
+ Architecture: 8-layer decoder, RoPE, SwiGLU, RMSNorm, GQA
174
 
175
+ Tokenizer verified (32k vocab, unigram SentencePiece)
176
 
177
  Environmental Impact
178
 
179
  Hardware Type: Consumer GPU/CPU
180
 
181
+ Training Time: Several days (low resource)
182
 
183
  Cloud Provider: None (local)
184
 
185
+ Carbon Emitted: Minimal (small model)
 
 
186
 
187
+ Technical Specifications
188
  Model Architecture and Objective
189
 
190
+ Decoder-only transformer
191
+
192
+ RoPE embeddings, SwiGLU MLP, RMSNorm
193
+
194
+ Grouped Query Attention
195
 
196
  Compute Infrastructure
197
 
198
+ Hardware: ~7 GB RAM system
199
 
200
  Software: PyTorch, SentencePiece
201
 
202
+ Citation
203
 
204
  BibTeX:
205
 
 
212
 
213
 
214
  APA:
215
+ Nexizan inc (2025). CoreX v0.1: Lightweight Transformer Language Model.
216
 
217
+ Glossary
218
 
219
+ RoPE: Rotary Position Embeddings
220
 
221
  SwiGLU: Swish-Gated Linear Unit
222
 
223
+ RMSNorm: Root Mean Square Norm
224
 
225
  GQA: Grouped Query Attention
226
 
227
+ More Information
228
 
229
+ CoreX v0.1 is the first milestone in the CoreX series, focused on offline-first, privacy-respecting AI systems. Future versions aim for larger datasets, more parameters, and better reasoning ability.
230
 
231
+ Model Card Authors
232
 
233
+ Nexizan inc CoreX Team
234
 
235
  Model Card Contact
236