abcsk123
/

PyraCode-1.5B

Text Generation

Model card Files Files and versions

PyraCode-1.5B / README.md

abcsk123's picture

Update README.md

92cb64a verified 8 days ago

|

history blame contribute delete

1.87 kB

	---
	license: mit
	datasets:
	- tokyotech-llm/swallow-code-v2
	language:
	- en
	- zh
	metrics:
	- accuracy
	base_model:
	- Qwen/Qwen2.5-Coder-1.5B
	library_name: transformers
	tags:
	- Qwen
	- HybridArch
	- sinkAttention
	- MLA
	- GQA
	---


	# PyraCode-1.5B

	## 🌟 Model Overview
	This is a custom-architected model based on `Qwen2.5-Coder-1.5B`. We introduced a novel Asymmetric Hybrid Architecture (GQA + MLA) with Cross-Layer Shared Latent Gates and Attention Sinks, enabling efficient feature communication and reduced KV-Cache memory footprint.

	## 🏗️ Architecture Innovations

	![PyraCode-1.5B_architecture](https://cdn-uploads.huggingface.co/production/uploads/67cd51087c6e6ea1cc18d236/0OVAfwp-zgh57USazP-iA.png)

	Unlike standard Qwen2 models, this `Hybrid-v9` backbone features:
	1. Asymmetric Layers:
	* L0-L6: Standard GQA (Grouped-Query Attention) for robust low-level feature extraction.
	* L7 (Shared Hub): Generates a global latent vector $c_{kv}$ (Rank 320).
	* L8-L27: Soft MLA (Multi-Head Latent Attention) with SVD-initialized low-rank projections.
	2. Shared Latent Gate: Deep layers can dynamically access the global latent vector from L7 via a learnable gating mechanism (`warmup_alpha`).
	3. HybridCache & Attention Sinks: Implements a sliding window (8192) alongside a 64-token attention sink to maintain generation stability at infinite sequence lengths.

	## 🚀 Quick Start

	⚠️ IMPORTANT:
	This project is not fully completed yet, and the current weighting is not a very good tradeoff.
	If I obtain new training results in the future, I will continue to update them here

	If you have decided to test this not-so-perfect weight, please be aware：
	Because this model uses a custom architecture, you MUST pass `trust_remote_code=True` when loading it.

	### Prerequisites
	```bash
	pip install transformers torch