|
|
--- |
|
|
license: apache-2.0 |
|
|
language: |
|
|
- en |
|
|
base_model: |
|
|
- algorythmtechnologies/Supernova25million |
|
|
--- |
|
|
# Supernova (25M) — AlgoRythm Technologies |
|
|
|
|
|
**Enhanced AI Assistant with Tool Integration** |
|
|
|
|
|
Supernova is a 25,000,000-parameter decoder-only Transformer, built from scratch, using the GPT‑2 tokenizer (vocab size 50,257) with an exact parameter budget — not exceeding by even 1 parameter. |
|
|
|
|
|
**🚀 Enhanced with Advanced AI Capabilities:** |
|
|
- **🧠 Advanced Reasoning Engine**: Multi-step problem solving, knowledge synthesis, domain expertise analysis |
|
|
- **📊 Math Engine Integration**: Advanced mathematical computations, scientific calculations, engineering equations |
|
|
- **🔍 Serper Web Search**: Real-time information, current events, factual queries |
|
|
- **🎓 Multi-Domain Expertise**: Science, Technology, Medicine, Business, Humanities, Arts |
|
|
- **⚡ Smart Tool Coordination**: Intelligent routing and chaining of multiple tools for complex queries |
|
|
- **🔬 Sophisticated Analysis**: Context-aware responses with evidence synthesis and comprehensive reasoning |
|
|
|
|
|
Key specs: |
|
|
- Exact params: 25,000,000 |
|
|
- Tokenizer: GPT‑2 (vocab_size = 50,257) |
|
|
- d_model: 320 |
|
|
- n_layers: 6 |
|
|
- n_heads: 10 (head_dim = 32) |
|
|
- n_positions: 4,748 (learned positional embeddings) |
|
|
- MLP ratio: 4.0 (hidden_size = 4 × d_model) |
|
|
- Weight tying: yes (LM head shares token embedding weights; no LM head bias) |
|
|
- Dropout: configurable (default 0.1) |
|
|
|
|
|
Why these numbers? They are chosen so that the total parameter count equals exactly 25,000,000 with GPT‑2 vocab size, using learned positional embeddings and tied output head. |
|
|
|
|
|
Parameter proof sketch (matches code): |
|
|
- Token embeddings: 50,257 × 320 = 16,082,240 |
|
|
- Positional embeddings: 4,748 × 320 = 1,519,360 |
|
|
- Per block: 12·d^2 + 13·d = 12·(320^2) + 13·320 = 1,228,800 + 4,160 = 1,232,960 |
|
|
- 6 blocks total: 7,397,760 |
|
|
- Final LayerNorm: 2·d = 640 |
|
|
- Total = 16,082,240 + 1,519,360 + 7,397,760 + 640 = 25,000,000 |
|
|
|
|
|
The verification script (supernova/verify_params.py) asserts this at runtime. |
|
|
|
|
|
Brand behavior: |
|
|
- The chat wrapper will return the AlgoRythm Tech – Company Profile & Vision text (branding/ALGORHYTHM_TECH_PROFILE.txt) when a prompt asks about AlgoRythm Tech/company profile/vision. |
|
|
|
|
|
Caution on scope: |
|
|
- “Knows everything that happened in the world” is not achievable in a single model; instead, this repo provides a scalable pipeline to train on broad, diverse, and massive text corpora. You control the data sources via a YAML config. |
|
|
|
|
|
Quickstart |
|
|
|
|
|
1) Install dependencies (Windows PowerShell) |
|
|
- Ensure Python 3.10+ is installed |
|
|
- Navigate to the project |
|
|
cd C:\Users\sriaa\supernova |
|
|
- Install dependencies |
|
|
pip install -r requirements.txt |
|
|
- If PyTorch wheel needs a specific index (GPU/CPU), follow https://pytorch.org/get-started/locally/ |
|
|
|
|
|
2) Verify exact parameter count and tokenizer vocabulary size |
|
|
python -m supernova.verify_params --config .\configs\supernova_25m.json |
|
|
Expected output includes: |
|
|
- vocab_size: 50257 |
|
|
- total_params: 25000000 (EXACT) |
|
|
|
|
|
3) Prepare data config (comprehensive knowledge training) |
|
|
- For comprehensive coverage across all subjects: |
|
|
copy .\configs\comprehensive_data_sources.yaml .\configs\data_sources.yaml |
|
|
- Or for basic setup: |
|
|
copy .\configs\data_sources.example.yaml .\configs\data_sources.yaml |
|
|
- Edit the file and enable/disable sources you want. Many are large and require significant bandwidth. |
|
|
|
|
|
4) Train (logs gradient norm and uses a strong LR schedule) |
|
|
python -m supernova.train ^ |
|
|
--config .\configs\supernova_25m.json ^ |
|
|
--data-config .\configs\data_sources.yaml ^ |
|
|
--seq-len 1024 ^ |
|
|
--batch-size 16 ^ |
|
|
--grad-accum 8 ^ |
|
|
--lr 3e-4 ^ |
|
|
--warmup-steps 2000 ^ |
|
|
--max-steps 100000 ^ |
|
|
--save-every 10000 |
|
|
Notes: |
|
|
- Gradient norm is printed regularly (no clipping by default). |
|
|
- Adjust batch/accum/seq-len by your hardware. |
|
|
- Cosine decay schedule with warmup is applied. |
|
|
|
|
|
5) Advanced Chat with Enhanced Reasoning (brand-aware; post-training) |
|
|
# API keys are already configured in configs/api_keys.yaml |
|
|
# - Math Engine: Built-in SymPy-based mathematical computation (no API key needed) |
|
|
# - Serper: Web search API configured |
|
|
|
|
|
# Advanced interactive chat with sophisticated reasoning |
|
|
python .\chat_advanced.py --config .\configs\supernova_25m.json |
|
|
|
|
|
# Single prompt mode with advanced analysis |
|
|
python .\chat_advanced.py --config .\configs\supernova_25m.json --prompt "Analyze the implications of artificial intelligence on healthcare from multiple perspectives" |
|
|
|
|
|
# Basic enhanced chat (legacy) |
|
|
python .\chat_enhanced.py --config .\configs\supernova_25m.json |
|
|
|
|
|
- **🧐 Complex reasoning queries** → Multi-step analysis using reasoning engine |
|
|
- **📊 Mathematical queries** → Routed to math engine for precise calculations |
|
|
- **🔍 Current events/facts** → Routed to Serper for real-time web search |
|
|
- **🏢 AlgoRythm Tech queries** → Returns company profile |
|
|
- **📚 Multi-domain questions** → Synthesizes expertise across scientific, technical, and academic fields |
|
|
- **🎓 General knowledge** → Enhanced model generation with sophisticated context |
|
|
|
|
|
Data sources (broad options) |
|
|
- Included in configs/data_sources.example.yaml. Example (enable selectively): |
|
|
- c4/en (Colossal Clean Crawled Corpus) |
|
|
- wikipedia/en |
|
|
- openwebtext |
|
|
- bookcorpusopen |
|
|
- the_pile |
|
|
Notes: |
|
|
- Review licenses and terms of each dataset. |
|
|
- You can add your own sources. The pipeline streams and interleaves by weight. |
|
|
|
|
|
Training details |
|
|
- Optimizer: AdamW (betas=(0.9, 0.95), weight_decay=0.1) |
|
|
- LR schedule: Cosine decay with warmup (proper schedule; no “shabby” LR) |
|
|
- Gradient norm: computed every log step and printed |
|
|
- Mixed precision: optional (bf16/fp16) if available |
|
|
- Checkpointing: periodic saving to output directory |
|
|
|
|
|
Brand profile |
|
|
- File: branding/ALGORHYTHM_TECH_PROFILE.txt |
|
|
- The chat wrapper uses this exact text for company-related queries. |
|
|
|
|
|
License |
|
|
- Apache 2.0 (see LICENSE) |
|
|
|
|
|
Attribution |
|
|
- Built by AlgoRythm Technologies. |