PyraCode-1.5B / README.md
abcsk123's picture
Update README.md
8443c24 verified
|
raw
history blame
1.66 kB
metadata
license: mit
datasets:
  - tokyotech-llm/swallow-code-v2
language:
  - en
  - zh
metrics:
  - accuracy
base_model:
  - Qwen/Qwen2.5-Coder-1.5B
library_name: transformers
tags:
  - Qwen
  - HybridArch
  - sinkAttention
  - MLA
  - GQA

PyraCode-1.5B

🌟 Model Overview

This is a custom-architected model based on Qwen2.5-Coder-1.5B. We introduced a novel Asymmetric Hybrid Architecture (GQA + MLA) with Cross-Layer Shared Latent Gates and Attention Sinks, enabling efficient feature communication and reduced KV-Cache memory footprint.

πŸ—οΈ Architecture Innovations

(θΏ™ι‡Œζ’ε…₯你用 picture.py η”Ÿζˆηš„ζžΆζž„ε›ΎοΌŒε―δ»₯ζŠŠε›Ύη‰‡ζ‹–θΏ› Hugging Face η½‘ι‘΅η‰ˆηš„ηΌ–θΎ‘ζ‘†ι‡Œθ‡ͺεŠ¨η”Ÿζˆι“ΎζŽ₯) Hybrid Architecture

Unlike standard Qwen2 models, this Hybrid-v9 backbone features:

  1. Asymmetric Layers:
    • L0-L6: Standard GQA (Grouped-Query Attention) for robust low-level feature extraction.
    • L7 (Shared Hub): Generates a global latent vector $c_{kv}$ (Rank 320).
    • L8-L27: Soft MLA (Multi-Head Latent Attention) with SVD-initialized low-rank projections.
  2. Shared Latent Gate: Deep layers can dynamically access the global latent vector from L7 via a learnable gating mechanism (warmup_alpha).
  3. HybridCache & Attention Sinks: Implements a sliding window (8192) alongside a 64-token attention sink to maintain generation stability at infinite sequence lengths.

πŸš€ Quick Start

⚠️ IMPORTANT: Because this model uses a custom architecture, you MUST pass trust_remote_code=True when loading it.

Prerequisites

pip install transformers torch