Spaces:
Runtime error
Runtime error
Delete chip-space/README_HF.md
Browse files- chip-space/README_HF.md +0 -57
chip-space/README_HF.md
DELETED
|
@@ -1,57 +0,0 @@
|
|
| 1 |
-
---
|
| 2 |
-
title: CHIP — Chinese High-density Instruction Protocol
|
| 3 |
-
emoji: 🀄
|
| 4 |
-
colorFrom: blue
|
| 5 |
-
colorTo: orange
|
| 6 |
-
sdk: gradio
|
| 7 |
-
sdk_version: 4.44.0
|
| 8 |
-
app_file: app.py
|
| 9 |
-
pinned: true
|
| 10 |
-
license: apache-2.0
|
| 11 |
-
short_description: 数据驱动的中文 prompt 协议化压缩工具
|
| 12 |
-
tags:
|
| 13 |
-
- chinese
|
| 14 |
-
- prompt-engineering
|
| 15 |
-
- llm
|
| 16 |
-
- tokenizer
|
| 17 |
-
- compression
|
| 18 |
-
---
|
| 19 |
-
|
| 20 |
-
# CHIP · 中文高密度提示协议
|
| 21 |
-
|
| 22 |
-
把啰嗦的中文 prompt 自动压成结构化高密度形式 — **数据驱动,不是品味**。
|
| 23 |
-
|
| 24 |
-
## 🎯 核心发现
|
| 25 |
-
|
| 26 |
-
基于 9 个主流 tokenizer × 200 句 FLORES-200 平行语料的 1800 行实测:
|
| 27 |
-
|
| 28 |
-
- **6 个国产 tokenizer 上中文 prompt token 数 ≤ 等价英文**
|
| 29 |
-
(Baichuan2: 中文省 12.5%,DeepSeek-V3: 省 8.4%,GLM-4: 省 7.6%)
|
| 30 |
-
- **OpenAI cl100k 上中文比英文贵 73%**
|
| 31 |
-
- **`###` 标签在所有 9 个 tokenizer 上都是 1 token**,完爆方括号方案
|
| 32 |
-
|
| 33 |
-
## 🔧 怎么用
|
| 34 |
-
|
| 35 |
-
在左侧粘贴你的中文 prompt,选择目标模型,点压缩。右侧会展示:
|
| 36 |
-
|
| 37 |
-
1. **压缩后的 prompt**(可一键复制)
|
| 38 |
-
2. **Token 统计**(在你选的 tokenizer 上节省了多少)
|
| 39 |
-
3. **命中的规则**(audit trail,可追溯每条改动)
|
| 40 |
-
|
| 41 |
-
## 📦 GitHub / pip
|
| 42 |
-
|
| 43 |
-
```bash
|
| 44 |
-
pip install chip-prompt
|
| 45 |
-
```
|
| 46 |
-
|
| 47 |
-
```python
|
| 48 |
-
from chip import compress
|
| 49 |
-
compress("请你帮我对下面这段文字进行一个全面的分析")
|
| 50 |
-
# → '分析下面这段文字'
|
| 51 |
-
```
|
| 52 |
-
|
| 53 |
-
🔗 [GitHub repo](https://github.com/marcuscw/CHIP) · [SPEC.md](https://github.com/marcuscw/CHIP/blob/main/SPEC.md) · [Datasets](https://github.com/marcuscw/CHIP/tree/main/results)
|
| 54 |
-
|
| 55 |
-
## ⚖️ License
|
| 56 |
-
|
| 57 |
-
Apache-2.0
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|