| license: apache-2.0 | |
| tags: | |
| - causal-lm | |
| - experimental | |
| - pytorch | |
| - checkpoint | |
| - powergqa | |
| # PowerGQA 778M experimental checkpoint | |
| Experimental causal language model checkpoint trained from scratch. | |
| ## Checkpoint | |
| - File: ckpt_1500.pt | |
| - Step: 1500 | |
| - Parameters: about 778.2M | |
| - Tokenizer: Qwen/Qwen2.5-0.5B tokenizer files included under okenizer/ | |
| - Context length used in training: 1024 | |
| ## Architecture | |
| Custom PowerGQA block: | |
| - grouped-query attention | |
| - Q/K RMSNorm | |
| - RoPE | |
| - talking-head pre/post mixing | |
| - learnable per-head gates | |
| - depthwise local convolution residual | |
| - SwiGLU MLP | |
| This is a raw research checkpoint, not a polished instruction model. Load with the included rain_powergqa_500m.py definitions. | |
| ## Training notes | |
| Phase 1 used filtered FineWeb-Edu. After this checkpoint, local training was switched to a no-code QA/reasoning curriculum, but this repo snapshot contains the saved ckpt_1500.pt. | |