powergqa-778m / README.md
Asilarknes's picture
Upload PowerGQA 778M checkpoint ckpt_1500
12192fd verified
metadata
license: apache-2.0
tags:
  - causal-lm
  - experimental
  - pytorch
  - checkpoint
  - powergqa

PowerGQA 778M experimental checkpoint

Experimental causal language model checkpoint trained from scratch.

Checkpoint

  • File: ckpt_1500.pt
  • Step: 1500
  • Parameters: about 778.2M
  • Tokenizer: Qwen/Qwen2.5-0.5B tokenizer files included under okenizer/
  • Context length used in training: 1024

Architecture

Custom PowerGQA block:

  • grouped-query attention
  • Q/K RMSNorm
  • RoPE
  • talking-head pre/post mixing
  • learnable per-head gates
  • depthwise local convolution residual
  • SwiGLU MLP

This is a raw research checkpoint, not a polished instruction model. Load with the included rain_powergqa_500m.py definitions.

Training notes

Phase 1 used filtered FineWeb-Edu. After this checkpoint, local training was switched to a no-code QA/reasoning curriculum, but this repo snapshot contains the saved ckpt_1500.pt.