Asilarknes
/

powergqa-778m

Model card Files Files and versions

powergqa-778m / README.md

Asilarknes's picture

Upload PowerGQA 778M checkpoint ckpt_1500

12192fd verified 1 day ago

|

history blame contribute delete

923 Bytes

	---
	license: apache-2.0
	tags:
	- causal-lm
	- experimental
	- pytorch
	- checkpoint
	- powergqa
	---

	# PowerGQA 778M experimental checkpoint

	Experimental causal language model checkpoint trained from scratch.

	## Checkpoint

	- File: ckpt_1500.pt
	- Step: 1500
	- Parameters: about 778.2M
	- Tokenizer: Qwen/Qwen2.5-0.5B tokenizer files included under okenizer/
	- Context length used in training: 1024

	## Architecture

	Custom PowerGQA block:

	- grouped-query attention
	- Q/K RMSNorm
	- RoPE
	- talking-head pre/post mixing
	- learnable per-head gates
	- depthwise local convolution residual
	- SwiGLU MLP

	This is a raw research checkpoint, not a polished instruction model. Load with the included rain_powergqa_500m.py definitions.

	## Training notes

	Phase 1 used filtered FineWeb-Edu. After this checkpoint, local training was switched to a no-code QA/reasoning curriculum, but this repo snapshot contains the saved ckpt_1500.pt.