abcsk123
/

PyraCode-1.5B

Text Generation

Model card Files Files and versions

PyraCode-1.5B / README.md

abcsk123's picture

Update README.md

2f654b9 verified 10 days ago

|

1.59 kB

	---
	license: mit
	---
	library_name: transformers
	tags:
	- custom-code
	- qwen2
	- mla
	- gqa
	- attention-sinks
	license: apache-2.0
	language:
	- en
	- zh
	---

	# PyraCode-1.5B

	## 🌟 Model Overview
	This is a custom-architected model based on `Qwen2.5-Coder-1.5B`. We introduced a novel Asymmetric Hybrid Architecture (GQA + MLA) with Cross-Layer Shared Latent Gates and Attention Sinks, enabling efficient feature communication and reduced KV-Cache memory footprint.

	## 🏗️ Architecture Innovations
	(这里插入你用 picture.py 生成的架构图，可以把图片拖进 Hugging Face 网页版的编辑框里自动生成链接)
	![Hybrid Architecture](填入你的图片链接)

	Unlike standard Qwen2 models, this `Hybrid-v9` backbone features:
	1. Asymmetric Layers:
	* L0-L6: Standard GQA (Grouped-Query Attention) for robust low-level feature extraction.
	* L7 (Shared Hub): Generates a global latent vector $c_{kv}$ (Rank 320).
	* L8-L27: Soft MLA (Multi-Head Latent Attention) with SVD-initialized low-rank projections.
	2. Shared Latent Gate: Deep layers can dynamically access the global latent vector from L7 via a learnable gating mechanism (`warmup_alpha`).
	3. HybridCache & Attention Sinks: Implements a sliding window (8192) alongside a 64-token attention sink to maintain generation stability at infinite sequence lengths.

	## 🚀 Quick Start

	⚠️ IMPORTANT: Because this model uses a custom architecture, you MUST pass `trust_remote_code=True` when loading it.

	### Prerequisites
	```bash
	pip install transformers torch