powergqa-778m / README.md
Asilarknes's picture
Upload PowerGQA 778M checkpoint ckpt_1500
12192fd verified
---
license: apache-2.0
tags:
- causal-lm
- experimental
- pytorch
- checkpoint
- powergqa
---
# PowerGQA 778M experimental checkpoint
Experimental causal language model checkpoint trained from scratch.
## Checkpoint
- File: ckpt_1500.pt
- Step: 1500
- Parameters: about 778.2M
- Tokenizer: Qwen/Qwen2.5-0.5B tokenizer files included under okenizer/
- Context length used in training: 1024
## Architecture
Custom PowerGQA block:
- grouped-query attention
- Q/K RMSNorm
- RoPE
- talking-head pre/post mixing
- learnable per-head gates
- depthwise local convolution residual
- SwiGLU MLP
This is a raw research checkpoint, not a polished instruction model. Load with the included rain_powergqa_500m.py definitions.
## Training notes
Phase 1 used filtered FineWeb-Edu. After this checkpoint, local training was switched to a no-code QA/reasoning curriculum, but this repo snapshot contains the saved ckpt_1500.pt.