metadata
license: mit
library_name: transformers tags: - custom-code - qwen2 - mla - gqa - attention-sinks license: apache-2.0 language: - en - zh
PyraCode-1.5B
π Model Overview
This is a custom-architected model based on Qwen2.5-Coder-1.5B. We introduced a novel Asymmetric Hybrid Architecture (GQA + MLA) with Cross-Layer Shared Latent Gates and Attention Sinks, enabling efficient feature communication and reduced KV-Cache memory footprint.
ποΈ Architecture Innovations
(θΏιζε
₯δ½ η¨ picture.py ηζηζΆζεΎοΌε―δ»₯ζεΎηζθΏ Hugging Face η½ι‘΅ηηηΌθΎζ‘ιθͺε¨ηζιΎζ₯)
Unlike standard Qwen2 models, this Hybrid-v9 backbone features:
- Asymmetric Layers:
- L0-L6: Standard GQA (Grouped-Query Attention) for robust low-level feature extraction.
- L7 (Shared Hub): Generates a global latent vector $c_{kv}$ (Rank 320).
- L8-L27: Soft MLA (Multi-Head Latent Attention) with SVD-initialized low-rank projections.
- Shared Latent Gate: Deep layers can dynamically access the global latent vector from L7 via a learnable gating mechanism (
warmup_alpha). - HybridCache & Attention Sinks: Implements a sliding window (8192) alongside a 64-token attention sink to maintain generation stability at infinite sequence lengths.
π Quick Start
β οΈ IMPORTANT: Because this model uses a custom architecture, you MUST pass trust_remote_code=True when loading it.
Prerequisites
pip install transformers torch