--- library_name: pytorch tags: - causal-lm - from-scratch - checkpoint - powergqa --- # PowerGQA 778M checkpoint Custom from-scratch causal LM checkpoint. - checkpoint: ckpt_6700.pt - parameters: about 778.2M - tokenizer: Qwen2.5 tokenizer - context length during training: 1024 - architecture: PowerGQA, 22 layers, dim 1536, 24 query heads, 6 KV heads - training phase: Phase10 scenario/arithmetic/logic repair curriculum This is a raw PyTorch training checkpoint, not a Transformers-converted model. Use the included training script architecture to load it.