File size: 8,729 Bytes
f7e81b6 489b83d f7e81b6 489b83d f7e81b6 489b83d f7e81b6 489b83d f7e81b6 489b83d f7e81b6 489b83d f7e81b6 489b83d f7e81b6 489b83d f7e81b6 489b83d f7e81b6 489b83d f7e81b6 489b83d f7e81b6 489b83d f7e81b6 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 | ---
language:
- zh
license: apache-2.0
tags:
- finance
- cryptocurrency
- chinese
- news-scoring
- text-classification
- text-regression
pipeline_tag: text-classification
library_name: transformers
base_model: LocalOptimum/chinese-crypto-sentiment
metrics:
- mae
- accuracy
- pearsonr
model-index:
- name: chinese-crypto-importance (v1.1)
results:
- task:
type: text-classification
name: News Importance Binning
metrics:
- type: mae
value: 6.87
name: MAE
- type: accuracy
value: 61.8%
name: Bin Accuracy
- type: pearsonr
value: 0.532
name: Pearson r
---
# Chinese Crypto News Importance Scoring Model | 中文加密货币新闻重要性评分模型 (v1.1)
## 模型描述 | Model Description
本模型基于 [LocalOptimum/chinese-crypto-sentiment](https://huggingface.co/LocalOptimum/chinese-crypto-sentiment) 进行 LoRA 微调,专门用于评估中文加密货币新闻的“市场重要性”,而不是传统的情感极性。
模型采用双头结构,同时输出:
- `importance_score`:0-100 连续分数,用于衡量新闻对市场的潜在影响
- `importance_bin`:4 档区间分类,分别为 `noise` / `low` / `medium` / `high`
它要回答的问题是:这条新闻是否值得交易员、研究员或自动化新闻流优先关注,而不只是判断文本是利好还是利空。
This model is LoRA fine-tuned from [LocalOptimum/chinese-crypto-sentiment](https://huggingface.co/LocalOptimum/chinese-crypto-sentiment) for Chinese cryptocurrency news importance scoring rather than plain sentiment classification. It outputs both a continuous score and a 4-way importance bin for ranking and filtering workflows.
## 训练数据 | Training Data
- 数据量 | Size: 20286 条中文加密货币新闻样本 | 20286 Chinese crypto news samples
- 数据来源 | Source: EventAlpha / WatchTower 采集的 19729 条新闻 + 557 条推文 | 19729 news articles + 557 tweets collected via EventAlpha / WatchTower
- 标注方式 | Labeling: 自动四维评分管线 + 规则修正 | 4-axis automatic scoring pipeline with rule-based cleanup
- 划分方式 | Split: 随机划分,训练集 17243 / 验证集 3043 | Random split with 17243 train and 3043 validation samples
- 平均分数 | Average Score: 41.7
### 标注维度 | Scoring Axes
| Axis | Range | Description |
|---|---:|---|
| Market Reaction | 0-40 | Post-news price move, volume expansion, and volatility reaction |
| Novelty | 0-30 | Whether the item is first-hand, repeated, or part of a digest |
| Content Quality | 0-20 | Information density, numeric detail, token relevance, and noise penalties |
| Source Authority | 0-10 | Credibility of the outlet, platform, and whether it is official |
### 数据分布 | Label Distribution
| Bin | Score Range | Count | Share | 含义 / Interpretation |
|---|---:|---:|---:|---|
| `noise` | 0-25 | 1626 | 8.0% | Low-signal, duplicate, digest, or weakly relevant content |
| `low` | 25-50 | 14773 | 72.8% | Routine updates that rarely move the market on their own |
| `medium` | 50-75 | 3840 | 18.9% | Tradeable developments with meaningful but limited impact |
| `high` | 75-100 | 47 | 0.2% | Major events that may materially change price or risk appetite |
## 性能指标 | Performance Metrics
当前公开版本在验证集上的表现如下:
| 指标 Metric | 数值 Value |
|---|---:|
| MAE | 6.87 |
| Bin Accuracy | 61.8% |
| Pearson r | 0.532 |
| Best Epoch | 4 |
## 分数解释 | Score Interpretation
| Bin | Score Range | 典型含义 |
|---|---:|---|
| `noise` | 0-25 | 摘要类、弱相关信息、重复快讯、低信号内容 |
| `low` | 25-50 | 常规更新、普通运营动作、主观评论、有限催化 |
| `medium` | 50-75 | 有交易意义的重要进展,但未必足以改变大趋势 |
| `high` | 75-100 | 黑客攻击、ETF 获批、重大监管变化、系统性风险事件 |
## 使用方法 | Usage
### 方式一:加载完整双头模型(推荐) | Option 1: load the full dual-head model
这种方式可以同时得到 `importance_score` 和 `importance_bin`。
```python
import __main__
import sys
import torch
from huggingface_hub import snapshot_download
from transformers import AutoTokenizer
repo_id = "LocalOptimum/chinese-crypto-importance"
local_dir = snapshot_download(repo_id)
sys.path.insert(0, local_dir)
from model import NewsImportanceModel
__main__.NewsImportanceModel = NewsImportanceModel
tokenizer = AutoTokenizer.from_pretrained(local_dir)
model = torch.load(f"{local_dir}/model.pt", map_location="cpu", weights_only=False)
model.eval()
text = "美国现货以太坊 ETF 获批"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=256)
with torch.no_grad():
logits, score = model(
input_ids=inputs["input_ids"],
attention_mask=inputs["attention_mask"],
token_type_ids=inputs.get("token_type_ids"),
)
probs = torch.softmax(logits, dim=-1)[0]
labels = ["noise", "low", "medium", "high"]
importance_bin = labels[probs.argmax().item()]
importance_score = score.item() * 100
print(importance_bin)
print(round(importance_score, 1))
```
### 方式二:仅使用 HuggingFace 分类头 | Option 2: use the classification head only
这种方式兼容 `pipeline("text-classification")`,但只能直接输出 4 档分类,不包含连续分数。
```python
from transformers import AutoModelForSequenceClassification, AutoTokenizer, pipeline
repo_id = "LocalOptimum/chinese-crypto-importance"
tokenizer = AutoTokenizer.from_pretrained(repo_id)
model = AutoModelForSequenceClassification.from_pretrained(repo_id)
pipe = pipeline("text-classification", model=model, tokenizer=tokenizer)
print(pipe("比特币突破关键阻力位并创下阶段新高"))
```
## 训练配置 | Training Configuration
- 基础模型 | Base Model: `LocalOptimum/chinese-crypto-sentiment`
- 模型结构 | Architecture: BERT backbone + classification head + regression head
- 最大长度 | Max Length: 256
- 训练轮数 | Epochs: 10(Early Stopping patience=3,最佳 epoch=4)
- 批次大小 | Batch Size: 16
- 学习率 | Learning Rate: 2e-5
- LoRA: `r=16`, `alpha=32`, `dropout=0.05`
- 损失函数 | Loss: `0.6 * cross_entropy + 0.4 * mse`
- 混合精度 | Mixed Precision: FP16
## 适用场景 | Use Cases
- 加密货币新闻优先级排序
- 实时快讯过滤与告警降噪
- 研究员 / 交易员新闻流预筛选
- 回测与研究中的事件权重特征构建
- 市场重大事件回溯分析
## 核心标注原则 | Annotation Principles
- 重要性不等于情绪:利好和利空都可能是高重要性
- 优先看市场反应,再结合新颖度、内容质量和来源可信度
- 重复快讯、摘要汇总、弱相关宏观噪声会被系统性降分
- 官方公告、重大安全事件、ETF / 监管突破通常更高分
- 主观观点和常规运营更新通常落在 `low` 或 `noise`
## 局限性 | Limitations
- 数据分布明显偏向 `low`,当前版本对高重要性事件仍偏保守
- `high` 样本较少,模型对极端高分事件的区分能力仍有提升空间
- 主要适用于中文加密货币新闻,跨领域泛化能力有限
- HuggingFace 原生 `pipeline` 只暴露分类头;连续分数需要加载 `model.pt`
- 标签来自自动评分管线与规则修正,不等同于大规模人工金融标注
## 许可证 | License
Apache-2.0
## 引用 | Citation
如果你在研究或产品中使用本模型,可以引用:
```bibtex
@misc{onefly_crypto_importance_2026,
title={Chinese Crypto News Importance Scoring Model},
author={Onefly},
year={2026},
howpublished={\url{https://huggingface.co/LocalOptimum/chinese-crypto-importance}},
note={LoRA fine-tuned from LocalOptimum/chinese-crypto-sentiment, 20286 samples, MAE=6.87, BinAcc=61.8%}
}
```
## 基础模型 | Base Model
本模型基于以下模型继续训练:
- [LocalOptimum/chinese-crypto-sentiment](https://huggingface.co/LocalOptimum/chinese-crypto-sentiment)
## 更新日志 | Changelog
### 当前公开版本 | Current Public Version
- 首个公开的重要性评分模型版本
- 支持双头输出:连续重要性分数 + 4 档重要性分类
- 基于 20286 条中文加密货币新闻样本完成训练
- 当前验证指标:MAE=6.87,Bin Accuracy=61.8%,Pearson r=0.532
如有问题或建议,欢迎提 issue 或 PR。
|