Supernova25million / README.md

algorythmtechnologies

Update README.md

10cc86d verified 4 months ago

preview code

raw

history blame contribute delete

5.98 kB

metadata

license: apache-2.0
language:
  - en
base_model:
  - algorythmtechnologies/Supernova25million

Supernova (25M) — AlgoRythm Technologies

Enhanced AI Assistant with Tool Integration

Supernova is a 25,000,000-parameter decoder-only Transformer, built from scratch, using the GPT‑2 tokenizer (vocab size 50,257) with an exact parameter budget — not exceeding by even 1 parameter.

🚀 Enhanced with Advanced AI Capabilities:

🧠 Advanced Reasoning Engine: Multi-step problem solving, knowledge synthesis, domain expertise analysis
📊 Math Engine Integration: Advanced mathematical computations, scientific calculations, engineering equations
🔍 Serper Web Search: Real-time information, current events, factual queries
🎓 Multi-Domain Expertise: Science, Technology, Medicine, Business, Humanities, Arts
⚡ Smart Tool Coordination: Intelligent routing and chaining of multiple tools for complex queries
🔬 Sophisticated Analysis: Context-aware responses with evidence synthesis and comprehensive reasoning

Key specs:

Exact params: 25,000,000
Tokenizer: GPT‑2 (vocab_size = 50,257)
d_model: 320
n_layers: 6
n_heads: 10 (head_dim = 32)
n_positions: 4,748 (learned positional embeddings)
MLP ratio: 4.0 (hidden_size = 4 × d_model)
Weight tying: yes (LM head shares token embedding weights; no LM head bias)
Dropout: configurable (default 0.1)

Why these numbers? They are chosen so that the total parameter count equals exactly 25,000,000 with GPT‑2 vocab size, using learned positional embeddings and tied output head.

Parameter proof sketch (matches code):

Token embeddings: 50,257 × 320 = 16,082,240
Positional embeddings: 4,748 × 320 = 1,519,360
Per block: 12·d^2 + 13·d = 12·(320^2) + 13·320 = 1,228,800 + 4,160 = 1,232,960
6 blocks total: 7,397,760
Final LayerNorm: 2·d = 640
Total = 16,082,240 + 1,519,360 + 7,397,760 + 640 = 25,000,000

The verification script (supernova/verify_params.py) asserts this at runtime.

Brand behavior:

The chat wrapper will return the AlgoRythm Tech – Company Profile & Vision text (branding/ALGORHYTHM_TECH_PROFILE.txt) when a prompt asks about AlgoRythm Tech/company profile/vision.

Caution on scope:

“Knows everything that happened in the world” is not achievable in a single model; instead, this repo provides a scalable pipeline to train on broad, diverse, and massive text corpora. You control the data sources via a YAML config.

Quickstart

Install dependencies (Windows PowerShell)

Ensure Python 3.10+ is installed
Navigate to the project cd C:\Users\sriaa\supernova
Install dependencies pip install -r requirements.txt
If PyTorch wheel needs a specific index (GPU/CPU), follow https://pytorch.org/get-started/locally/

Verify exact parameter count and tokenizer vocabulary size python -m supernova.verify_params --config .\configs\supernova_25m.json Expected output includes:

vocab_size: 50257
total_params: 25000000 (EXACT)

Prepare data config (comprehensive knowledge training)

For comprehensive coverage across all subjects: copy .\configs\comprehensive_data_sources.yaml .\configs\data_sources.yaml
Or for basic setup: copy .\configs\data_sources.example.yaml .\configs\data_sources.yaml
Edit the file and enable/disable sources you want. Many are large and require significant bandwidth.

Train (logs gradient norm and uses a strong LR schedule) python -m supernova.train ^ --config .\configs\supernova_25m.json ^ --data-config .\configs\data_sources.yaml ^ --seq-len 1024 ^ --batch-size 16 ^ --grad-accum 8 ^ --lr 3e-4 ^ --warmup-steps 2000 ^ --max-steps 100000 ^ --save-every 10000 Notes:

Gradient norm is printed regularly (no clipping by default).
Adjust batch/accum/seq-len by your hardware.
Cosine decay schedule with warmup is applied.

Advanced Chat with Enhanced Reasoning (brand-aware; post-training)

API keys are already configured in configs/api_keys.yaml

- Math Engine: Built-in SymPy-based mathematical computation (no API key needed)

- Serper: Web search API configured

Advanced interactive chat with sophisticated reasoning

python .\chat_advanced.py --config .\configs\supernova_25m.json

Single prompt mode with advanced analysis

python .\chat_advanced.py --config .\configs\supernova_25m.json --prompt "Analyze the implications of artificial intelligence on healthcare from multiple perspectives"

Basic enhanced chat (legacy)

python .\chat_enhanced.py --config .\configs\supernova_25m.json

🧐 Complex reasoning queries → Multi-step analysis using reasoning engine
📊 Mathematical queries → Routed to math engine for precise calculations
🔍 Current events/facts → Routed to Serper for real-time web search
🏢 AlgoRythm Tech queries → Returns company profile
📚 Multi-domain questions → Synthesizes expertise across scientific, technical, and academic fields
🎓 General knowledge → Enhanced model generation with sophisticated context

Data sources (broad options)

Included in configs/data_sources.example.yaml. Example (enable selectively):
- c4/en (Colossal Clean Crawled Corpus)
- wikipedia/en
- openwebtext
- bookcorpusopen
- the_pile Notes:
Review licenses and terms of each dataset.
You can add your own sources. The pipeline streams and interleaves by weight.

Training details

Optimizer: AdamW (betas=(0.9, 0.95), weight_decay=0.1)
LR schedule: Cosine decay with warmup (proper schedule; no “shabby” LR)
Gradient norm: computed every log step and printed
Mixed precision: optional (bf16/fp16) if available
Checkpointing: periodic saving to output directory

Brand profile

File: branding/ALGORHYTHM_TECH_PROFILE.txt
The chat wrapper uses this exact text for company-related queries.

License

Apache 2.0 (see LICENSE)

Attribution

Built by AlgoRythm Technologies.