--- license: mit --- library_name: transformers tags: - custom-code - qwen2 - mla - gqa - attention-sinks license: apache-2.0 language: - en - zh --- # PyraCode-1.5B ## 🌟 Model Overview This is a custom-architected model based on `Qwen2.5-Coder-1.5B`. We introduced a novel **Asymmetric Hybrid Architecture (GQA + MLA)** with **Cross-Layer Shared Latent Gates** and **Attention Sinks**, enabling efficient feature communication and reduced KV-Cache memory footprint. ## πŸ—οΈ Architecture Innovations *(θΏ™ι‡Œζ’ε…₯你用 picture.py η”Ÿζˆηš„ζžΆζž„ε›ΎοΌŒε―δ»₯ζŠŠε›Ύη‰‡ζ‹–θΏ› Hugging Face η½‘ι‘΅η‰ˆηš„ηΌ–θΎ‘ζ‘†ι‡Œθ‡ͺεŠ¨η”Ÿζˆι“ΎζŽ₯)* ![Hybrid Architecture](ε‘«ε…₯δ½ ηš„ε›Ύη‰‡ι“ΎζŽ₯) Unlike standard Qwen2 models, this `Hybrid-v9` backbone features: 1. **Asymmetric Layers:** * **L0-L6:** Standard GQA (Grouped-Query Attention) for robust low-level feature extraction. * **L7 (Shared Hub):** Generates a global latent vector $c_{kv}$ (Rank 320). * **L8-L27:** Soft MLA (Multi-Head Latent Attention) with SVD-initialized low-rank projections. 2. **Shared Latent Gate:** Deep layers can dynamically access the global latent vector from L7 via a learnable gating mechanism (`warmup_alpha`). 3. **HybridCache & Attention Sinks:** Implements a sliding window (8192) alongside a 64-token attention sink to maintain generation stability at infinite sequence lengths. ## πŸš€ Quick Start **⚠️ IMPORTANT:** Because this model uses a custom architecture, you **MUST** pass `trust_remote_code=True` when loading it. ### Prerequisites ```bash pip install transformers torch