Update README.md
Browse files
README.md
CHANGED
|
@@ -1,80 +1,89 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
Model Card for CoreX v0.1
|
| 2 |
|
| 3 |
-
|
| 4 |
|
| 5 |
Model Details
|
| 6 |
Model Description
|
| 7 |
|
| 8 |
Developed by: Nexizan Company
|
| 9 |
|
| 10 |
-
Funded by
|
| 11 |
|
| 12 |
-
Shared by
|
| 13 |
|
| 14 |
-
Model type:
|
| 15 |
|
| 16 |
-
Language(s)
|
| 17 |
|
| 18 |
License: Apache-2.0
|
| 19 |
|
| 20 |
-
Finetuned from model
|
| 21 |
|
| 22 |
-
Model Sources
|
| 23 |
|
| 24 |
-
Repository:
|
| 25 |
|
| 26 |
-
Paper
|
| 27 |
|
| 28 |
-
Demo
|
| 29 |
|
| 30 |
Uses
|
| 31 |
Direct Use
|
| 32 |
|
| 33 |
-
|
| 34 |
|
| 35 |
Text generation and summarization
|
| 36 |
|
| 37 |
-
Code and math
|
| 38 |
|
| 39 |
-
Educational
|
| 40 |
|
| 41 |
-
Downstream Use
|
| 42 |
|
| 43 |
-
|
| 44 |
|
| 45 |
-
Integration into
|
| 46 |
|
| 47 |
Out-of-Scope Use
|
| 48 |
|
| 49 |
-
Medical,
|
| 50 |
|
| 51 |
-
|
| 52 |
|
| 53 |
-
|
| 54 |
|
| 55 |
Bias, Risks, and Limitations
|
| 56 |
|
| 57 |
-
|
| 58 |
|
| 59 |
-
|
| 60 |
|
| 61 |
-
|
| 62 |
|
| 63 |
-
|
| 64 |
|
| 65 |
Recommendations
|
| 66 |
|
| 67 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 68 |
|
| 69 |
-
|
| 70 |
|
| 71 |
-
|
| 72 |
|
| 73 |
-
How to Get Started with the Model
|
| 74 |
python chat_interface.py
|
| 75 |
|
| 76 |
|
| 77 |
-
Or in Python:
|
| 78 |
|
| 79 |
from transformers import AutoTokenizer, AutoModelForCausalLM
|
| 80 |
|
|
@@ -94,98 +103,103 @@ Tokens: ~9.2M
|
|
| 94 |
|
| 95 |
Avg length: ~266 tokens
|
| 96 |
|
| 97 |
-
Max length: 1024
|
| 98 |
|
| 99 |
-
Tokenizer: SentencePiece unigram, vocab
|
| 100 |
|
| 101 |
-
Preprocessing
|
| 102 |
|
| 103 |
-
|
| 104 |
|
| 105 |
-
Special tokens
|
|
|
|
|
|
|
| 106 |
|
| 107 |
Training Hyperparameters
|
| 108 |
|
| 109 |
-
|
| 110 |
|
| 111 |
Hidden size: 512
|
| 112 |
|
| 113 |
Layers: 8
|
| 114 |
|
| 115 |
-
Attention heads: 8 (2
|
| 116 |
|
| 117 |
Intermediate size: 1365 (SwiGLU)
|
| 118 |
|
| 119 |
-
Max
|
| 120 |
|
| 121 |
-
Learning rate: 5e-4 (cosine
|
| 122 |
|
| 123 |
Optimizer: AdamW (β1=0.9, β2=0.95, wd=0.1)
|
| 124 |
|
| 125 |
-
Batch size: 2 (
|
| 126 |
|
| 127 |
Steps: 50,000
|
| 128 |
|
| 129 |
-
Speeds, Sizes, Times
|
| 130 |
|
| 131 |
Parameters: ~54.8M
|
| 132 |
|
| 133 |
Checkpoint size: ~220MB
|
| 134 |
|
| 135 |
-
|
| 136 |
|
| 137 |
Evaluation
|
| 138 |
-
Testing Data, Factors & Metrics
|
| 139 |
Testing Data
|
| 140 |
|
| 141 |
-
|
| 142 |
|
| 143 |
Factors
|
| 144 |
|
| 145 |
-
|
| 146 |
|
| 147 |
Metrics
|
| 148 |
|
| 149 |
-
Perplexity (PPL)
|
| 150 |
|
| 151 |
Results
|
| 152 |
|
| 153 |
-
|
| 154 |
|
| 155 |
-
|
| 156 |
|
| 157 |
Summary
|
| 158 |
|
| 159 |
-
CoreX v0.1
|
|
|
|
|
|
|
| 160 |
|
| 161 |
-
|
| 162 |
|
| 163 |
-
|
| 164 |
|
| 165 |
Environmental Impact
|
| 166 |
|
| 167 |
Hardware Type: Consumer GPU/CPU
|
| 168 |
|
| 169 |
-
|
| 170 |
|
| 171 |
Cloud Provider: None (local)
|
| 172 |
|
| 173 |
-
|
| 174 |
-
|
| 175 |
-
Carbon Emitted: Low (small model size)
|
| 176 |
|
| 177 |
-
Technical Specifications
|
| 178 |
Model Architecture and Objective
|
| 179 |
|
| 180 |
-
Decoder-only transformer
|
|
|
|
|
|
|
|
|
|
|
|
|
| 181 |
|
| 182 |
Compute Infrastructure
|
| 183 |
|
| 184 |
-
Hardware: ~
|
| 185 |
|
| 186 |
Software: PyTorch, SentencePiece
|
| 187 |
|
| 188 |
-
Citation
|
| 189 |
|
| 190 |
BibTeX:
|
| 191 |
|
|
@@ -198,25 +212,25 @@ BibTeX:
|
|
| 198 |
|
| 199 |
|
| 200 |
APA:
|
| 201 |
-
Nexizan
|
| 202 |
|
| 203 |
-
Glossary
|
| 204 |
|
| 205 |
-
RoPE: Rotary Position
|
| 206 |
|
| 207 |
SwiGLU: Swish-Gated Linear Unit
|
| 208 |
|
| 209 |
-
RMSNorm: Root Mean Square
|
| 210 |
|
| 211 |
GQA: Grouped Query Attention
|
| 212 |
|
| 213 |
-
More Information
|
| 214 |
|
| 215 |
-
CoreX is
|
| 216 |
|
| 217 |
-
Model Card Authors
|
| 218 |
|
| 219 |
-
Nexizan
|
| 220 |
|
| 221 |
Model Card Contact
|
| 222 |
|
|
|
|
| 1 |
+
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
language:
|
| 4 |
+
|
| 5 |
+
en
|
| 6 |
+
|
| 7 |
Model Card for CoreX v0.1
|
| 8 |
|
| 9 |
+
CoreX v0.1 is a lightweight, decoder-only transformer built by Nexizan Company. It is designed to run efficiently on low-resource systems (~7 GB RAM) while supporting offline AI assistants, coding tutors, and sandbox experiments.
|
| 10 |
|
| 11 |
Model Details
|
| 12 |
Model Description
|
| 13 |
|
| 14 |
Developed by: Nexizan Company
|
| 15 |
|
| 16 |
+
Funded by : Self-funded
|
| 17 |
|
| 18 |
+
Shared by : Nexizan inc *CoreX team* ( Faisal - *LitRush* )
|
| 19 |
|
| 20 |
+
Model type: Causal LM (transformer, decoder-only)
|
| 21 |
|
| 22 |
+
Language(s): English
|
| 23 |
|
| 24 |
License: Apache-2.0
|
| 25 |
|
| 26 |
+
Finetuned from model : None (trained from scratch)
|
| 27 |
|
| 28 |
+
Model Sources
|
| 29 |
|
| 30 |
+
Repository: to be added
|
| 31 |
|
| 32 |
+
Paper: N/A
|
| 33 |
|
| 34 |
+
Demo: Local CLI via chat_interface.py
|
| 35 |
|
| 36 |
Uses
|
| 37 |
Direct Use
|
| 38 |
|
| 39 |
+
Chat-based assistant (offline/terminal)
|
| 40 |
|
| 41 |
Text generation and summarization
|
| 42 |
|
| 43 |
+
Code and math Q&A
|
| 44 |
|
| 45 |
+
Educational or personal projects
|
| 46 |
|
| 47 |
+
Downstream Use
|
| 48 |
|
| 49 |
+
Domain-specific fine-tuning (education, productivity, private tools)
|
| 50 |
|
| 51 |
+
Integration into offline AI platforms (e.g., NexIN prototype)
|
| 52 |
|
| 53 |
Out-of-Scope Use
|
| 54 |
|
| 55 |
+
Medical, financial, or legal advice
|
| 56 |
|
| 57 |
+
Safety-critical or autonomous systems
|
| 58 |
|
| 59 |
+
Content generation without moderation
|
| 60 |
|
| 61 |
Bias, Risks, and Limitations
|
| 62 |
|
| 63 |
+
Limited training size (~9.2M tokens) → restricted knowledge
|
| 64 |
|
| 65 |
+
Biases from dataset may appear in responses
|
| 66 |
|
| 67 |
+
Non-English performance is weak
|
| 68 |
|
| 69 |
+
Risk of hallucinations or unsafe generations
|
| 70 |
|
| 71 |
Recommendations
|
| 72 |
|
| 73 |
+
Use a moderation/filtering layer in deployment
|
| 74 |
+
|
| 75 |
+
Fine-tune with curated, domain-specific datasets
|
| 76 |
+
|
| 77 |
+
Always keep a human-in-the-loop for sensitive applications
|
| 78 |
|
| 79 |
+
How to Get Started
|
| 80 |
|
| 81 |
+
Run the interactive chat interface:
|
| 82 |
|
|
|
|
| 83 |
python chat_interface.py
|
| 84 |
|
| 85 |
|
| 86 |
+
Or load directly in Python:
|
| 87 |
|
| 88 |
from transformers import AutoTokenizer, AutoModelForCausalLM
|
| 89 |
|
|
|
|
| 103 |
|
| 104 |
Avg length: ~266 tokens
|
| 105 |
|
| 106 |
+
Max length: 1024
|
| 107 |
|
| 108 |
+
Tokenizer: SentencePiece unigram, vocab=32,000
|
| 109 |
|
| 110 |
+
Preprocessing
|
| 111 |
|
| 112 |
+
Unicode normalization
|
| 113 |
|
| 114 |
+
Special tokens (<pad>, <unk>, <s>, </s>)
|
| 115 |
+
|
| 116 |
+
Deduplication and filtering
|
| 117 |
|
| 118 |
Training Hyperparameters
|
| 119 |
|
| 120 |
+
Regime: Mixed precision (CPU/GPU optimized)
|
| 121 |
|
| 122 |
Hidden size: 512
|
| 123 |
|
| 124 |
Layers: 8
|
| 125 |
|
| 126 |
+
Attention heads: 8 (2 KV heads)
|
| 127 |
|
| 128 |
Intermediate size: 1365 (SwiGLU)
|
| 129 |
|
| 130 |
+
Max positions: 2048
|
| 131 |
|
| 132 |
+
Learning rate: 5e-4 (cosine decay, warmup 1k steps)
|
| 133 |
|
| 134 |
Optimizer: AdamW (β1=0.9, β2=0.95, wd=0.1)
|
| 135 |
|
| 136 |
+
Batch size: 2 (effective 32 with accumulation)
|
| 137 |
|
| 138 |
Steps: 50,000
|
| 139 |
|
| 140 |
+
Speeds, Sizes, Times
|
| 141 |
|
| 142 |
Parameters: ~54.8M
|
| 143 |
|
| 144 |
Checkpoint size: ~220MB
|
| 145 |
|
| 146 |
+
Hardware target: 7 GB RAM systems
|
| 147 |
|
| 148 |
Evaluation
|
|
|
|
| 149 |
Testing Data
|
| 150 |
|
| 151 |
+
Held-out samples from training corpus
|
| 152 |
|
| 153 |
Factors
|
| 154 |
|
| 155 |
+
Conversational text, code snippets, math expressions
|
| 156 |
|
| 157 |
Metrics
|
| 158 |
|
| 159 |
+
Perplexity (PPL), loss
|
| 160 |
|
| 161 |
Results
|
| 162 |
|
| 163 |
+
Training loss decreased steadily
|
| 164 |
|
| 165 |
+
Early tests show coherent text and code generation
|
| 166 |
|
| 167 |
Summary
|
| 168 |
|
| 169 |
+
CoreX v0.1 achieves usable fluency for small-scale tasks. It is not comparable to large LLMs, but excels at lightweight, private, offline usage.
|
| 170 |
+
|
| 171 |
+
Model Examination
|
| 172 |
|
| 173 |
+
Architecture: 8-layer decoder, RoPE, SwiGLU, RMSNorm, GQA
|
| 174 |
|
| 175 |
+
Tokenizer verified (32k vocab, unigram SentencePiece)
|
| 176 |
|
| 177 |
Environmental Impact
|
| 178 |
|
| 179 |
Hardware Type: Consumer GPU/CPU
|
| 180 |
|
| 181 |
+
Training Time: Several days (low resource)
|
| 182 |
|
| 183 |
Cloud Provider: None (local)
|
| 184 |
|
| 185 |
+
Carbon Emitted: Minimal (small model)
|
|
|
|
|
|
|
| 186 |
|
| 187 |
+
Technical Specifications
|
| 188 |
Model Architecture and Objective
|
| 189 |
|
| 190 |
+
Decoder-only transformer
|
| 191 |
+
|
| 192 |
+
RoPE embeddings, SwiGLU MLP, RMSNorm
|
| 193 |
+
|
| 194 |
+
Grouped Query Attention
|
| 195 |
|
| 196 |
Compute Infrastructure
|
| 197 |
|
| 198 |
+
Hardware: ~7 GB RAM system
|
| 199 |
|
| 200 |
Software: PyTorch, SentencePiece
|
| 201 |
|
| 202 |
+
Citation
|
| 203 |
|
| 204 |
BibTeX:
|
| 205 |
|
|
|
|
| 212 |
|
| 213 |
|
| 214 |
APA:
|
| 215 |
+
Nexizan inc (2025). CoreX v0.1: Lightweight Transformer Language Model.
|
| 216 |
|
| 217 |
+
Glossary
|
| 218 |
|
| 219 |
+
RoPE: Rotary Position Embeddings
|
| 220 |
|
| 221 |
SwiGLU: Swish-Gated Linear Unit
|
| 222 |
|
| 223 |
+
RMSNorm: Root Mean Square Norm
|
| 224 |
|
| 225 |
GQA: Grouped Query Attention
|
| 226 |
|
| 227 |
+
More Information
|
| 228 |
|
| 229 |
+
CoreX v0.1 is the first milestone in the CoreX series, focused on offline-first, privacy-respecting AI systems. Future versions aim for larger datasets, more parameters, and better reasoning ability.
|
| 230 |
|
| 231 |
+
Model Card Authors
|
| 232 |
|
| 233 |
+
Nexizan inc — CoreX Team
|
| 234 |
|
| 235 |
Model Card Contact
|
| 236 |
|