abcsk123 commited on
Commit
ce5016a
·
verified ·
1 Parent(s): 49f6b1c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +40 -3
README.md CHANGED
@@ -1,3 +1,40 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ ---
4
+ library_name: transformers
5
+ tags:
6
+ - custom-code
7
+ - qwen2
8
+ - mla
9
+ - gqa
10
+ - attention-sinks
11
+ license: apache-2.0
12
+ language:
13
+ - en
14
+ - zh
15
+ ---
16
+
17
+ # Qwen2.5-Coder-1.5B-Hybrid-v9
18
+
19
+ ## 🌟 Model Overview
20
+ This is a custom-architected model based on `Qwen2.5-Coder-1.5B`. We introduced a novel **Asymmetric Hybrid Architecture (GQA + MLA)** with **Cross-Layer Shared Latent Gates** and **Attention Sinks**, enabling efficient feature communication and reduced KV-Cache memory footprint.
21
+
22
+ ## 🏗️ Architecture Innovations
23
+ *(这里插入你用 picture.py 生成的架构图,可以把图片拖进 Hugging Face 网页版的编辑框里自动生成链接)*
24
+ ![Hybrid Architecture](填入你的图片链接)
25
+
26
+ Unlike standard Qwen2 models, this `Hybrid-v9` backbone features:
27
+ 1. **Asymmetric Layers:**
28
+ * **L0-L6:** Standard GQA (Grouped-Query Attention) for robust low-level feature extraction.
29
+ * **L7 (Shared Hub):** Generates a global latent vector $c_{kv}$ (Rank 320).
30
+ * **L8-L27:** Soft MLA (Multi-Head Latent Attention) with SVD-initialized low-rank projections.
31
+ 2. **Shared Latent Gate:** Deep layers can dynamically access the global latent vector from L7 via a learnable gating mechanism (`warmup_alpha`).
32
+ 3. **HybridCache & Attention Sinks:** Implements a sliding window (8192) alongside a 64-token attention sink to maintain generation stability at infinite sequence lengths.
33
+
34
+ ## 🚀 Quick Start
35
+
36
+ **⚠️ IMPORTANT:** Because this model uses a custom architecture, you **MUST** pass `trust_remote_code=True` when loading it.
37
+
38
+ ### Prerequisites
39
+ ```bash
40
+ pip install transformers torch