Update README.md
Browse files
README.md
CHANGED
|
@@ -1,130 +1,137 @@
|
|
| 1 |
-
|
| 2 |
-
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
|
| 10 |
-
|
| 11 |
-
|
| 12 |
-
-
|
| 13 |
-
|
| 14 |
-
|
| 15 |
-
|
| 16 |
-
-
|
| 17 |
-
-
|
| 18 |
-
-
|
| 19 |
-
-
|
| 20 |
-
-
|
| 21 |
-
|
| 22 |
-
|
| 23 |
-
-
|
| 24 |
-
-
|
| 25 |
-
|
| 26 |
-
|
| 27 |
-
|
| 28 |
-
|
| 29 |
-
-
|
| 30 |
-
-
|
| 31 |
-
-
|
| 32 |
-
|
| 33 |
-
|
| 34 |
-
|
| 35 |
-
|
| 36 |
-
|
| 37 |
-
|
| 38 |
-
|
| 39 |
-
-
|
| 40 |
-
|
| 41 |
-
|
| 42 |
-
|
| 43 |
-
|
| 44 |
-
|
| 45 |
-
|
| 46 |
-
|
| 47 |
-
|
| 48 |
-
|
| 49 |
-
|
| 50 |
-
|
| 51 |
-
|
| 52 |
-
|
| 53 |
-
|
| 54 |
-
|
| 55 |
-
|
| 56 |
-
|
| 57 |
-
-
|
| 58 |
-
|
| 59 |
-
|
| 60 |
-
|
| 61 |
-
|
| 62 |
-
|
| 63 |
-
|
| 64 |
-
|
| 65 |
-
-
|
| 66 |
-
|
| 67 |
-
|
| 68 |
-
|
| 69 |
-
|
| 70 |
-
|
| 71 |
-
|
| 72 |
-
|
| 73 |
-
|
| 74 |
-
|
| 75 |
-
|
| 76 |
-
--
|
| 77 |
-
--
|
| 78 |
-
|
| 79 |
-
-
|
| 80 |
-
-
|
| 81 |
-
-
|
| 82 |
-
|
| 83 |
-
|
| 84 |
-
|
| 85 |
-
|
| 86 |
-
|
| 87 |
-
|
| 88 |
-
|
| 89 |
-
|
| 90 |
-
|
| 91 |
-
#
|
| 92 |
-
|
| 93 |
-
|
| 94 |
-
|
| 95 |
-
|
| 96 |
-
|
| 97 |
-
|
| 98 |
-
|
| 99 |
-
|
| 100 |
-
|
| 101 |
-
|
| 102 |
-
|
| 103 |
-
|
| 104 |
-
|
| 105 |
-
-
|
| 106 |
-
|
| 107 |
-
|
| 108 |
-
|
| 109 |
-
|
| 110 |
-
|
| 111 |
-
|
| 112 |
-
-
|
| 113 |
-
-
|
| 114 |
-
|
| 115 |
-
|
| 116 |
-
-
|
| 117 |
-
-
|
| 118 |
-
|
| 119 |
-
-
|
| 120 |
-
-
|
| 121 |
-
|
| 122 |
-
|
| 123 |
-
-
|
| 124 |
-
-
|
| 125 |
-
|
| 126 |
-
|
| 127 |
-
-
|
| 128 |
-
|
| 129 |
-
|
| 130 |
-
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
language:
|
| 4 |
+
- en
|
| 5 |
+
base_model:
|
| 6 |
+
- algorythmtechnologies/suslm
|
| 7 |
+
---
|
| 8 |
+
# Supernova (25M) — AlgoRythm Technologies
|
| 9 |
+
|
| 10 |
+
**Enhanced AI Assistant with Tool Integration**
|
| 11 |
+
|
| 12 |
+
Supernova is a 25,000,000-parameter decoder-only Transformer, built from scratch, using the GPT‑2 tokenizer (vocab size 50,257) with an exact parameter budget — not exceeding by even 1 parameter.
|
| 13 |
+
|
| 14 |
+
**🚀 Enhanced with Advanced AI Capabilities:**
|
| 15 |
+
- **🧠 Advanced Reasoning Engine**: Multi-step problem solving, knowledge synthesis, domain expertise analysis
|
| 16 |
+
- **📊 Math Engine Integration**: Advanced mathematical computations, scientific calculations, engineering equations
|
| 17 |
+
- **🔍 Serper Web Search**: Real-time information, current events, factual queries
|
| 18 |
+
- **🎓 Multi-Domain Expertise**: Science, Technology, Medicine, Business, Humanities, Arts
|
| 19 |
+
- **⚡ Smart Tool Coordination**: Intelligent routing and chaining of multiple tools for complex queries
|
| 20 |
+
- **🔬 Sophisticated Analysis**: Context-aware responses with evidence synthesis and comprehensive reasoning
|
| 21 |
+
|
| 22 |
+
Key specs:
|
| 23 |
+
- Exact params: 25,000,000
|
| 24 |
+
- Tokenizer: GPT‑2 (vocab_size = 50,257)
|
| 25 |
+
- d_model: 320
|
| 26 |
+
- n_layers: 6
|
| 27 |
+
- n_heads: 10 (head_dim = 32)
|
| 28 |
+
- n_positions: 4,748 (learned positional embeddings)
|
| 29 |
+
- MLP ratio: 4.0 (hidden_size = 4 × d_model)
|
| 30 |
+
- Weight tying: yes (LM head shares token embedding weights; no LM head bias)
|
| 31 |
+
- Dropout: configurable (default 0.1)
|
| 32 |
+
|
| 33 |
+
Why these numbers? They are chosen so that the total parameter count equals exactly 25,000,000 with GPT‑2 vocab size, using learned positional embeddings and tied output head.
|
| 34 |
+
|
| 35 |
+
Parameter proof sketch (matches code):
|
| 36 |
+
- Token embeddings: 50,257 × 320 = 16,082,240
|
| 37 |
+
- Positional embeddings: 4,748 × 320 = 1,519,360
|
| 38 |
+
- Per block: 12·d^2 + 13·d = 12·(320^2) + 13·320 = 1,228,800 + 4,160 = 1,232,960
|
| 39 |
+
- 6 blocks total: 7,397,760
|
| 40 |
+
- Final LayerNorm: 2·d = 640
|
| 41 |
+
- Total = 16,082,240 + 1,519,360 + 7,397,760 + 640 = 25,000,000
|
| 42 |
+
|
| 43 |
+
The verification script (supernova/verify_params.py) asserts this at runtime.
|
| 44 |
+
|
| 45 |
+
Brand behavior:
|
| 46 |
+
- The chat wrapper will return the AlgoRythm Tech – Company Profile & Vision text (branding/ALGORHYTHM_TECH_PROFILE.txt) when a prompt asks about AlgoRythm Tech/company profile/vision.
|
| 47 |
+
|
| 48 |
+
Caution on scope:
|
| 49 |
+
- “Knows everything that happened in the world” is not achievable in a single model; instead, this repo provides a scalable pipeline to train on broad, diverse, and massive text corpora. You control the data sources via a YAML config.
|
| 50 |
+
|
| 51 |
+
Quickstart
|
| 52 |
+
|
| 53 |
+
1) Install dependencies (Windows PowerShell)
|
| 54 |
+
- Ensure Python 3.10+ is installed
|
| 55 |
+
- Navigate to the project
|
| 56 |
+
cd C:\Users\sriaa\supernova
|
| 57 |
+
- Install dependencies
|
| 58 |
+
pip install -r requirements.txt
|
| 59 |
+
- If PyTorch wheel needs a specific index (GPU/CPU), follow https://pytorch.org/get-started/locally/
|
| 60 |
+
|
| 61 |
+
2) Verify exact parameter count and tokenizer vocabulary size
|
| 62 |
+
python -m supernova.verify_params --config .\configs\supernova_25m.json
|
| 63 |
+
Expected output includes:
|
| 64 |
+
- vocab_size: 50257
|
| 65 |
+
- total_params: 25000000 (EXACT)
|
| 66 |
+
|
| 67 |
+
3) Prepare data config (comprehensive knowledge training)
|
| 68 |
+
- For comprehensive coverage across all subjects:
|
| 69 |
+
copy .\configs\comprehensive_data_sources.yaml .\configs\data_sources.yaml
|
| 70 |
+
- Or for basic setup:
|
| 71 |
+
copy .\configs\data_sources.example.yaml .\configs\data_sources.yaml
|
| 72 |
+
- Edit the file and enable/disable sources you want. Many are large and require significant bandwidth.
|
| 73 |
+
|
| 74 |
+
4) Train (logs gradient norm and uses a strong LR schedule)
|
| 75 |
+
python -m supernova.train ^
|
| 76 |
+
--config .\configs\supernova_25m.json ^
|
| 77 |
+
--data-config .\configs\data_sources.yaml ^
|
| 78 |
+
--seq-len 1024 ^
|
| 79 |
+
--batch-size 16 ^
|
| 80 |
+
--grad-accum 8 ^
|
| 81 |
+
--lr 3e-4 ^
|
| 82 |
+
--warmup-steps 2000 ^
|
| 83 |
+
--max-steps 100000 ^
|
| 84 |
+
--save-every 10000
|
| 85 |
+
Notes:
|
| 86 |
+
- Gradient norm is printed regularly (no clipping by default).
|
| 87 |
+
- Adjust batch/accum/seq-len by your hardware.
|
| 88 |
+
- Cosine decay schedule with warmup is applied.
|
| 89 |
+
|
| 90 |
+
5) Advanced Chat with Enhanced Reasoning (brand-aware; post-training)
|
| 91 |
+
# API keys are already configured in configs/api_keys.yaml
|
| 92 |
+
# - Math Engine: Built-in SymPy-based mathematical computation (no API key needed)
|
| 93 |
+
# - Serper: Web search API configured
|
| 94 |
+
|
| 95 |
+
# Advanced interactive chat with sophisticated reasoning
|
| 96 |
+
python .\chat_advanced.py --config .\configs\supernova_25m.json
|
| 97 |
+
|
| 98 |
+
# Single prompt mode with advanced analysis
|
| 99 |
+
python .\chat_advanced.py --config .\configs\supernova_25m.json --prompt "Analyze the implications of artificial intelligence on healthcare from multiple perspectives"
|
| 100 |
+
|
| 101 |
+
# Basic enhanced chat (legacy)
|
| 102 |
+
python .\chat_enhanced.py --config .\configs\supernova_25m.json
|
| 103 |
+
|
| 104 |
+
- **🧐 Complex reasoning queries** → Multi-step analysis using reasoning engine
|
| 105 |
+
- **📊 Mathematical queries** → Routed to math engine for precise calculations
|
| 106 |
+
- **🔍 Current events/facts** → Routed to Serper for real-time web search
|
| 107 |
+
- **🏢 AlgoRythm Tech queries** → Returns company profile
|
| 108 |
+
- **📚 Multi-domain questions** → Synthesizes expertise across scientific, technical, and academic fields
|
| 109 |
+
- **🎓 General knowledge** → Enhanced model generation with sophisticated context
|
| 110 |
+
|
| 111 |
+
Data sources (broad options)
|
| 112 |
+
- Included in configs/data_sources.example.yaml. Example (enable selectively):
|
| 113 |
+
- c4/en (Colossal Clean Crawled Corpus)
|
| 114 |
+
- wikipedia/en
|
| 115 |
+
- openwebtext
|
| 116 |
+
- bookcorpusopen
|
| 117 |
+
- the_pile
|
| 118 |
+
Notes:
|
| 119 |
+
- Review licenses and terms of each dataset.
|
| 120 |
+
- You can add your own sources. The pipeline streams and interleaves by weight.
|
| 121 |
+
|
| 122 |
+
Training details
|
| 123 |
+
- Optimizer: AdamW (betas=(0.9, 0.95), weight_decay=0.1)
|
| 124 |
+
- LR schedule: Cosine decay with warmup (proper schedule; no “shabby” LR)
|
| 125 |
+
- Gradient norm: computed every log step and printed
|
| 126 |
+
- Mixed precision: optional (bf16/fp16) if available
|
| 127 |
+
- Checkpointing: periodic saving to output directory
|
| 128 |
+
|
| 129 |
+
Brand profile
|
| 130 |
+
- File: branding/ALGORHYTHM_TECH_PROFILE.txt
|
| 131 |
+
- The chat wrapper uses this exact text for company-related queries.
|
| 132 |
+
|
| 133 |
+
License
|
| 134 |
+
- Apache 2.0 (see LICENSE)
|
| 135 |
+
|
| 136 |
+
Attribution
|
| 137 |
+
- Built by AlgoRythm Technologies.
|