algorythmtechnologies
/

Supernova25million

English

Model card Files Files and versions

xet

Community

algorythmtechnologies commited on Sep 20, 2025

Commit

82d25b9

verified ·

1 Parent(s): 6ce3b41

Update README.md

Browse files

Files changed (1) hide show

README.md +137 -130

README.md CHANGED Viewed

@@ -1,130 +1,137 @@
-# Supernova (25M) — AlgoRythm Technologies
-**Enhanced AI Assistant with Tool Integration**
-Supernova is a 25,000,000-parameter decoder-only Transformer, built from scratch, using the GPT‑2 tokenizer (vocab size 50,257) with an exact parameter budget — not exceeding by even 1 parameter.
-**🚀 Enhanced with Advanced AI Capabilities:**
-- **🧠 Advanced Reasoning Engine**: Multi-step problem solving, knowledge synthesis, domain expertise analysis
-- **📊 Math Engine Integration**: Advanced mathematical computations, scientific calculations, engineering equations
-- **🔍 Serper Web Search**: Real-time information, current events, factual queries
-- **🎓 Multi-Domain Expertise**: Science, Technology, Medicine, Business, Humanities, Arts
-- **⚡ Smart Tool Coordination**: Intelligent routing and chaining of multiple tools for complex queries
-- **🔬 Sophisticated Analysis**: Context-aware responses with evidence synthesis and comprehensive reasoning
-Key specs:
-- Exact params: 25,000,000
-- Tokenizer: GPT‑2 (vocab_size = 50,257)
-- d_model: 320
-- n_layers: 6
-- n_heads: 10 (head_dim = 32)
-- n_positions: 4,748 (learned positional embeddings)
-- MLP ratio: 4.0 (hidden_size = 4 × d_model)
-- Weight tying: yes (LM head shares token embedding weights; no LM head bias)
-- Dropout: configurable (default 0.1)
-Why these numbers? They are chosen so that the total parameter count equals exactly 25,000,000 with GPT‑2 vocab size, using learned positional embeddings and tied output head.
-Parameter proof sketch (matches code):
-- Token embeddings: 50,257 × 320 = 16,082,240
-- Positional embeddings: 4,748 × 320 = 1,519,360
-- Per block: 12·d^2 + 13·d = 12·(320^2) + 13·320 = 1,228,800 + 4,160 = 1,232,960
-- 6 blocks total: 7,397,760
-- Final LayerNorm: 2·d = 640
-- Total = 16,082,240 + 1,519,360 + 7,397,760 + 640 = 25,000,000
-The verification script (supernova/verify_params.py) asserts this at runtime.
-Brand behavior:
-- The chat wrapper will return the AlgoRythm Tech – Company Profile & Vision text (branding/ALGORHYTHM_TECH_PROFILE.txt) when a prompt asks about AlgoRythm Tech/company profile/vision.
-Caution on scope:
-- “Knows everything that happened in the world” is not achievable in a single model; instead, this repo provides a scalable pipeline to train on broad, diverse, and massive text corpora. You control the data sources via a YAML config.
-Quickstart
-1) Install dependencies (Windows PowerShell)
-- Ensure Python 3.10+ is installed
-- Navigate to the project
-  cd C:\Users\sriaa\supernova
-- Install dependencies
-  pip install -r requirements.txt
-- If PyTorch wheel needs a specific index (GPU/CPU), follow https://pytorch.org/get-started/locally/
-2) Verify exact parameter count and tokenizer vocabulary size
-  python -m supernova.verify_params --config .\configs\supernova_25m.json
-Expected output includes:
-- vocab_size: 50257
-- total_params: 25000000 (EXACT)
-3) Prepare data config (comprehensive knowledge training)
-- For comprehensive coverage across all subjects:
-  copy .\configs\comprehensive_data_sources.yaml .\configs\data_sources.yaml
-- Or for basic setup:
-  copy .\configs\data_sources.example.yaml .\configs\data_sources.yaml
-- Edit the file and enable/disable sources you want. Many are large and require significant bandwidth.
-4) Train (logs gradient norm and uses a strong LR schedule)
-  python -m supernova.train ^
-    --config .\configs\supernova_25m.json ^
-    --data-config .\configs\data_sources.yaml ^
-    --seq-len 1024 ^
-    --batch-size 16 ^
-    --grad-accum 8 ^
-    --lr 3e-4 ^
-    --warmup-steps 2000 ^
-    --max-steps 100000 ^
-    --save-every 10000
-Notes:
-- Gradient norm is printed regularly (no clipping by default).
-- Adjust batch/accum/seq-len by your hardware.
-- Cosine decay schedule with warmup is applied.
-5) Advanced Chat with Enhanced Reasoning (brand-aware; post-training)
-  # API keys are already configured in configs/api_keys.yaml
-  # - Math Engine: Built-in SymPy-based mathematical computation (no API key needed)
-  # - Serper: Web search API configured
-  # Advanced interactive chat with sophisticated reasoning
-  python .\chat_advanced.py --config .\configs\supernova_25m.json
-  # Single prompt mode with advanced analysis
-  python .\chat_advanced.py --config .\configs\supernova_25m.json --prompt "Analyze the implications of artificial intelligence on healthcare from multiple perspectives"
-  # Basic enhanced chat (legacy)
-  python .\chat_enhanced.py --config .\configs\supernova_25m.json
-- **🧐 Complex reasoning queries** → Multi-step analysis using reasoning engine
-- **📊 Mathematical queries** → Routed to math engine for precise calculations
-- **🔍 Current events/facts** → Routed to Serper for real-time web search
-- **🏢 AlgoRythm Tech queries** → Returns company profile
-- **📚 Multi-domain questions** → Synthesizes expertise across scientific, technical, and academic fields
-- **🎓 General knowledge** → Enhanced model generation with sophisticated context
-Data sources (broad options)
-- Included in configs/data_sources.example.yaml. Example (enable selectively):
-  - c4/en (Colossal Clean Crawled Corpus)
-  - wikipedia/en
-  - openwebtext
-  - bookcorpusopen
-  - the_pile
-Notes:
-- Review licenses and terms of each dataset.
-- You can add your own sources. The pipeline streams and interleaves by weight.
-Training details
-- Optimizer: AdamW (betas=(0.9, 0.95), weight_decay=0.1)
-- LR schedule: Cosine decay with warmup (proper schedule; no “shabby” LR)
-- Gradient norm: computed every log step and printed
-- Mixed precision: optional (bf16/fp16) if available
-- Checkpointing: periodic saving to output directory
-Brand profile
-- File: branding/ALGORHYTHM_TECH_PROFILE.txt
-- The chat wrapper uses this exact text for company-related queries.
-License
-- Apache 2.0 (see LICENSE)
-Attribution
-- Built by AlgoRythm Technologies.

+---
+license: apache-2.0
+language:
+- en
+base_model:
+- algorythmtechnologies/suslm
+---
+# Supernova (25M) — AlgoRythm Technologies
+**Enhanced AI Assistant with Tool Integration**
+Supernova is a 25,000,000-parameter decoder-only Transformer, built from scratch, using the GPT‑2 tokenizer (vocab size 50,257) with an exact parameter budget — not exceeding by even 1 parameter.
+**🚀 Enhanced with Advanced AI Capabilities:**
+- **🧠 Advanced Reasoning Engine**: Multi-step problem solving, knowledge synthesis, domain expertise analysis
+- **📊 Math Engine Integration**: Advanced mathematical computations, scientific calculations, engineering equations
+- **🔍 Serper Web Search**: Real-time information, current events, factual queries
+- **🎓 Multi-Domain Expertise**: Science, Technology, Medicine, Business, Humanities, Arts
+- **⚡ Smart Tool Coordination**: Intelligent routing and chaining of multiple tools for complex queries
+- **🔬 Sophisticated Analysis**: Context-aware responses with evidence synthesis and comprehensive reasoning
+Key specs:
+- Exact params: 25,000,000
+- Tokenizer: GPT‑2 (vocab_size = 50,257)
+- d_model: 320
+- n_layers: 6
+- n_heads: 10 (head_dim = 32)
+- n_positions: 4,748 (learned positional embeddings)
+- MLP ratio: 4.0 (hidden_size = 4 × d_model)
+- Weight tying: yes (LM head shares token embedding weights; no LM head bias)
+- Dropout: configurable (default 0.1)
+Why these numbers? They are chosen so that the total parameter count equals exactly 25,000,000 with GPT‑2 vocab size, using learned positional embeddings and tied output head.
+Parameter proof sketch (matches code):
+- Token embeddings: 50,257 × 320 = 16,082,240
+- Positional embeddings: 4,748 × 320 = 1,519,360
+- Per block: 12·d^2 + 13·d = 12·(320^2) + 13·320 = 1,228,800 + 4,160 = 1,232,960
+- 6 blocks total: 7,397,760
+- Final LayerNorm: 2·d = 640
+- Total = 16,082,240 + 1,519,360 + 7,397,760 + 640 = 25,000,000
+The verification script (supernova/verify_params.py) asserts this at runtime.
+Brand behavior:
+- The chat wrapper will return the AlgoRythm Tech – Company Profile & Vision text (branding/ALGORHYTHM_TECH_PROFILE.txt) when a prompt asks about AlgoRythm Tech/company profile/vision.
+Caution on scope:
+- “Knows everything that happened in the world” is not achievable in a single model; instead, this repo provides a scalable pipeline to train on broad, diverse, and massive text corpora. You control the data sources via a YAML config.
+Quickstart
+1) Install dependencies (Windows PowerShell)
+- Ensure Python 3.10+ is installed
+- Navigate to the project
+  cd C:\Users\sriaa\supernova
+- Install dependencies
+  pip install -r requirements.txt
+- If PyTorch wheel needs a specific index (GPU/CPU), follow https://pytorch.org/get-started/locally/
+2) Verify exact parameter count and tokenizer vocabulary size
+  python -m supernova.verify_params --config .\configs\supernova_25m.json
+Expected output includes:
+- vocab_size: 50257
+- total_params: 25000000 (EXACT)
+3) Prepare data config (comprehensive knowledge training)
+- For comprehensive coverage across all subjects:
+  copy .\configs\comprehensive_data_sources.yaml .\configs\data_sources.yaml
+- Or for basic setup:
+  copy .\configs\data_sources.example.yaml .\configs\data_sources.yaml
+- Edit the file and enable/disable sources you want. Many are large and require significant bandwidth.
+4) Train (logs gradient norm and uses a strong LR schedule)
+  python -m supernova.train ^
+    --config .\configs\supernova_25m.json ^
+    --data-config .\configs\data_sources.yaml ^
+    --seq-len 1024 ^
+    --batch-size 16 ^
+    --grad-accum 8 ^
+    --lr 3e-4 ^
+    --warmup-steps 2000 ^
+    --max-steps 100000 ^
+    --save-every 10000
+Notes:
+- Gradient norm is printed regularly (no clipping by default).
+- Adjust batch/accum/seq-len by your hardware.
+- Cosine decay schedule with warmup is applied.
+5) Advanced Chat with Enhanced Reasoning (brand-aware; post-training)
+  # API keys are already configured in configs/api_keys.yaml
+  # - Math Engine: Built-in SymPy-based mathematical computation (no API key needed)
+  # - Serper: Web search API configured
+  # Advanced interactive chat with sophisticated reasoning
+  python .\chat_advanced.py --config .\configs\supernova_25m.json
+  # Single prompt mode with advanced analysis
+  python .\chat_advanced.py --config .\configs\supernova_25m.json --prompt "Analyze the implications of artificial intelligence on healthcare from multiple perspectives"
+  # Basic enhanced chat (legacy)
+  python .\chat_enhanced.py --config .\configs\supernova_25m.json
+- **🧐 Complex reasoning queries** → Multi-step analysis using reasoning engine
+- **📊 Mathematical queries** → Routed to math engine for precise calculations
+- **🔍 Current events/facts** → Routed to Serper for real-time web search
+- **🏢 AlgoRythm Tech queries** → Returns company profile
+- **📚 Multi-domain questions** → Synthesizes expertise across scientific, technical, and academic fields
+- **🎓 General knowledge** → Enhanced model generation with sophisticated context
+Data sources (broad options)
+- Included in configs/data_sources.example.yaml. Example (enable selectively):
+  - c4/en (Colossal Clean Crawled Corpus)
+  - wikipedia/en
+  - openwebtext
+  - bookcorpusopen
+  - the_pile
+Notes:
+- Review licenses and terms of each dataset.
+- You can add your own sources. The pipeline streams and interleaves by weight.
+Training details
+- Optimizer: AdamW (betas=(0.9, 0.95), weight_decay=0.1)
+- LR schedule: Cosine decay with warmup (proper schedule; no “shabby” LR)
+- Gradient norm: computed every log step and printed
+- Mixed precision: optional (bf16/fp16) if available
+- Checkpointing: periodic saving to output directory
+Brand profile
+- File: branding/ALGORHYTHM_TECH_PROFILE.txt
+- The chat wrapper uses this exact text for company-related queries.
+License
+- Apache 2.0 (see LICENSE)
+Attribution
+- Built by AlgoRythm Technologies.