|
|
--- |
|
|
license: apache-2.0 |
|
|
metrics: |
|
|
- accuracy |
|
|
- character |
|
|
--- |
|
|
|
|
|
license: apache-2.0 |
|
|
language: |
|
|
|
|
|
en |
|
|
|
|
|
Model Card for CoreX v0.1 |
|
|
|
|
|
CoreX v0.1 is a lightweight, decoder-only transformer built by Nexizan Company. It is designed to run efficiently on low-resource systems (~7 GB RAM) while supporting offline AI assistants, coding tutors, and sandbox experiments. |
|
|
|
|
|
Model Details |
|
|
Model Description |
|
|
|
|
|
Developed by: Nexizan Company |
|
|
|
|
|
Funded by : Self-funded |
|
|
|
|
|
Shared by : Nexizan inc *CoreX team* ( Faisal - *LitRush* ) |
|
|
|
|
|
Model type: Causal LM (transformer, decoder-only) |
|
|
|
|
|
Language(s): English |
|
|
|
|
|
License: Apache-2.0 |
|
|
|
|
|
Finetuned from model : None (trained from scratch) |
|
|
|
|
|
Model Sources |
|
|
|
|
|
Repository: to be added |
|
|
|
|
|
Paper: N/A |
|
|
|
|
|
Demo: Local CLI via chat_interface.py |
|
|
|
|
|
Uses |
|
|
Direct Use |
|
|
|
|
|
Chat-based assistant (offline/terminal) |
|
|
|
|
|
Text generation and summarization |
|
|
|
|
|
Code and math Q&A |
|
|
|
|
|
Educational or personal projects |
|
|
|
|
|
Downstream Use |
|
|
|
|
|
Domain-specific fine-tuning (education, productivity, private tools) |
|
|
|
|
|
Integration into offline AI platforms (e.g., NexIN prototype) |
|
|
|
|
|
Out-of-Scope Use |
|
|
|
|
|
Medical, financial, or legal advice |
|
|
|
|
|
Safety-critical or autonomous systems |
|
|
|
|
|
Content generation without moderation |
|
|
|
|
|
Bias, Risks, and Limitations |
|
|
|
|
|
Limited training size (~9.2M tokens) → restricted knowledge |
|
|
|
|
|
Biases from dataset may appear in responses |
|
|
|
|
|
Non-English performance is weak |
|
|
|
|
|
Risk of hallucinations or unsafe generations |
|
|
|
|
|
Recommendations |
|
|
|
|
|
Use a moderation/filtering layer in deployment |
|
|
|
|
|
Fine-tune with curated, domain-specific datasets |
|
|
|
|
|
Always keep a human-in-the-loop for sensitive applications |
|
|
|
|
|
How to Get Started |
|
|
|
|
|
Run the interactive chat interface: |
|
|
|
|
|
python chat_interface.py |
|
|
|
|
|
|
|
|
Or load directly in Python: |
|
|
|
|
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
|
|
|
|
tokenizer = AutoTokenizer.from_pretrained("path/to/corex_tok.model") |
|
|
model = AutoModelForCausalLM.from_pretrained("path/to/final_model.pt") |
|
|
|
|
|
inputs = tokenizer("Hello CoreX!", return_tensors="pt") |
|
|
outputs = model.generate(**inputs, max_new_tokens=50) |
|
|
print(tokenizer.decode(outputs[0])) |
|
|
|
|
|
Training Details |
|
|
Training Data |
|
|
|
|
|
Samples: 34,559 |
|
|
|
|
|
Tokens: ~9.2M |
|
|
|
|
|
Avg length: ~266 tokens |
|
|
|
|
|
Max length: 1024 |
|
|
|
|
|
Tokenizer: SentencePiece unigram, vocab=32,000 |
|
|
|
|
|
Preprocessing |
|
|
|
|
|
Unicode normalization |
|
|
|
|
|
Special tokens (<pad>, <unk>, <s>, </s>) |
|
|
|
|
|
Deduplication and filtering |
|
|
|
|
|
Training Hyperparameters |
|
|
|
|
|
Regime: Mixed precision (CPU/GPU optimized) |
|
|
|
|
|
Hidden size: 512 |
|
|
|
|
|
Layers: 8 |
|
|
|
|
|
Attention heads: 8 (2 KV heads) |
|
|
|
|
|
Intermediate size: 1365 (SwiGLU) |
|
|
|
|
|
Max positions: 2048 |
|
|
|
|
|
Learning rate: 5e-4 (cosine decay, warmup 1k steps) |
|
|
|
|
|
Optimizer: AdamW (β1=0.9, β2=0.95, wd=0.1) |
|
|
|
|
|
Batch size: 2 (effective 32 with accumulation) |
|
|
|
|
|
Steps: 50,000 |
|
|
|
|
|
Speeds, Sizes, Times |
|
|
|
|
|
Parameters: ~54.8M |
|
|
|
|
|
Checkpoint size: ~220MB |
|
|
|
|
|
Hardware target: 7 GB RAM systems |
|
|
|
|
|
Evaluation |
|
|
Testing Data |
|
|
|
|
|
Held-out samples from training corpus |
|
|
|
|
|
Factors |
|
|
|
|
|
Conversational text, code snippets, math expressions |
|
|
|
|
|
Metrics |
|
|
|
|
|
Perplexity (PPL), loss |
|
|
|
|
|
Results |
|
|
|
|
|
Training loss decreased steadily |
|
|
|
|
|
Early tests show coherent text and code generation |
|
|
|
|
|
Summary |
|
|
|
|
|
CoreX v0.1 achieves usable fluency for small-scale tasks. It is not comparable to large LLMs, but excels at lightweight, private, offline usage. |
|
|
|
|
|
Model Examination |
|
|
|
|
|
Architecture: 8-layer decoder, RoPE, SwiGLU, RMSNorm, GQA |
|
|
|
|
|
Tokenizer verified (32k vocab, unigram SentencePiece) |
|
|
|
|
|
Environmental Impact |
|
|
|
|
|
Hardware Type: Consumer GPU/CPU |
|
|
|
|
|
Training Time: Several days (low resource) |
|
|
|
|
|
Cloud Provider: None (local) |
|
|
|
|
|
Carbon Emitted: Minimal (small model) |
|
|
|
|
|
Technical Specifications |
|
|
Model Architecture and Objective |
|
|
|
|
|
Decoder-only transformer |
|
|
|
|
|
RoPE embeddings, SwiGLU MLP, RMSNorm |
|
|
|
|
|
Grouped Query Attention |
|
|
|
|
|
Compute Infrastructure |
|
|
|
|
|
Hardware: ~7 GB RAM system |
|
|
|
|
|
Software: PyTorch, SentencePiece |
|
|
|
|
|
Citation |
|
|
|
|
|
BibTeX: |
|
|
|
|
|
@misc{corex2025, |
|
|
title={CoreX v0.1: Lightweight Transformer Language Model}, |
|
|
author={Nexizan Company}, |
|
|
year={2025}, |
|
|
license={Apache-2.0} |
|
|
} |
|
|
|
|
|
|
|
|
APA: |
|
|
Nexizan inc (2025). CoreX v0.1: Lightweight Transformer Language Model. |
|
|
|
|
|
Glossary |
|
|
|
|
|
RoPE: Rotary Position Embeddings |
|
|
|
|
|
SwiGLU: Swish-Gated Linear Unit |
|
|
|
|
|
RMSNorm: Root Mean Square Norm |
|
|
|
|
|
GQA: Grouped Query Attention |
|
|
|
|
|
More Information |
|
|
|
|
|
CoreX v0.1 is the first milestone in the CoreX series, focused on offline-first, privacy-respecting AI systems. Future versions aim for larger datasets, more parameters, and better reasoning ability. |
|
|
|
|
|
Model Card Authors |
|
|
|
|
|
Nexizan inc — CoreX Team |
|
|
|
|
|
Model Card Contact |
|
|
|
|
|
N/A |