File size: 8,729 Bytes
f7e81b6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
489b83d
f7e81b6
 
 
 
 
 
489b83d
f7e81b6
 
489b83d
f7e81b6
 
489b83d
f7e81b6
 
 
489b83d
f7e81b6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
489b83d
 
f7e81b6
489b83d
 
f7e81b6
 
 
 
 
 
 
 
 
 
 
 
 
 
489b83d
 
 
 
f7e81b6
 
 
 
 
 
 
489b83d
 
 
 
f7e81b6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
489b83d
f7e81b6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
489b83d
f7e81b6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
489b83d
 
f7e81b6
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
---

language:
  - zh
license: apache-2.0
tags:
  - finance
  - cryptocurrency
  - chinese
  - news-scoring
  - text-classification
  - text-regression
pipeline_tag: text-classification
library_name: transformers
base_model: LocalOptimum/chinese-crypto-sentiment
metrics:
  - mae
  - accuracy
  - pearsonr
model-index:
  - name: chinese-crypto-importance (v1.1)
    results:
      - task:
          type: text-classification
          name: News Importance Binning
        metrics:
          - type: mae
            value: 6.87
            name: MAE
          - type: accuracy
            value: 61.8%
            name: Bin Accuracy
          - type: pearsonr
            value: 0.532
            name: Pearson r
---


# Chinese Crypto News Importance Scoring Model | 中文加密货币新闻重要性评分模型 (v1.1)

## 模型描述 | Model Description

本模型基于 [LocalOptimum/chinese-crypto-sentiment](https://huggingface.co/LocalOptimum/chinese-crypto-sentiment) 进行 LoRA 微调,专门用于评估中文加密货币新闻的“市场重要性”,而不是传统的情感极性。

模型采用双头结构,同时输出:

- `importance_score`:0-100 连续分数,用于衡量新闻对市场的潜在影响
- `importance_bin`:4 档区间分类,分别为 `noise` / `low` / `medium` / `high`

它要回答的问题是:这条新闻是否值得交易员、研究员或自动化新闻流优先关注,而不只是判断文本是利好还是利空。

This model is LoRA fine-tuned from [LocalOptimum/chinese-crypto-sentiment](https://huggingface.co/LocalOptimum/chinese-crypto-sentiment) for Chinese cryptocurrency news importance scoring rather than plain sentiment classification. It outputs both a continuous score and a 4-way importance bin for ranking and filtering workflows.

## 训练数据 | Training Data

- 数据量 | Size: 20286 条中文加密货币新闻样本 | 20286 Chinese crypto news samples
- 数据来源 | Source: EventAlpha / WatchTower 采集的 19729 条新闻 + 557 条推文 | 19729 news articles + 557 tweets collected via EventAlpha / WatchTower
- 标注方式 | Labeling: 自动四维评分管线 + 规则修正 | 4-axis automatic scoring pipeline with rule-based cleanup
- 划分方式 | Split: 随机划分,训练集 17243 / 验证集 3043 | Random split with 17243 train and 3043 validation samples
- 平均分数 | Average Score: 41.7

### 标注维度 | Scoring Axes

| Axis | Range | Description |
|---|---:|---|
| Market Reaction | 0-40 | Post-news price move, volume expansion, and volatility reaction |
| Novelty | 0-30 | Whether the item is first-hand, repeated, or part of a digest |
| Content Quality | 0-20 | Information density, numeric detail, token relevance, and noise penalties |
| Source Authority | 0-10 | Credibility of the outlet, platform, and whether it is official |

### 数据分布 | Label Distribution

| Bin | Score Range | Count | Share | 含义 / Interpretation |
|---|---:|---:|---:|---|
| `noise` | 0-25 | 1626 | 8.0% | Low-signal, duplicate, digest, or weakly relevant content |
| `low` | 25-50 | 14773 | 72.8% | Routine updates that rarely move the market on their own |
| `medium` | 50-75 | 3840 | 18.9% | Tradeable developments with meaningful but limited impact |
| `high` | 75-100 | 47 | 0.2% | Major events that may materially change price or risk appetite |

## 性能指标 | Performance Metrics

当前公开版本在验证集上的表现如下:

| 指标 Metric | 数值 Value |
|---|---:|
| MAE | 6.87 |
| Bin Accuracy | 61.8% |
| Pearson r | 0.532 |
| Best Epoch | 4 |

## 分数解释 | Score Interpretation

| Bin | Score Range | 典型含义 |
|---|---:|---|
| `noise` | 0-25 | 摘要类、弱相关信息、重复快讯、低信号内容 |
| `low` | 25-50 | 常规更新、普通运营动作、主观评论、有限催化 |
| `medium` | 50-75 | 有交易意义的重要进展,但未必足以改变大趋势 |
| `high` | 75-100 | 黑客攻击、ETF 获批、重大监管变化、系统性风险事件 |

## 使用方法 | Usage

### 方式一:加载完整双头模型(推荐) | Option 1: load the full dual-head model

这种方式可以同时得到 `importance_score``importance_bin````python

import __main__

import sys

import torch

from huggingface_hub import snapshot_download

from transformers import AutoTokenizer



repo_id = "LocalOptimum/chinese-crypto-importance"

local_dir = snapshot_download(repo_id)

sys.path.insert(0, local_dir)



from model import NewsImportanceModel



__main__.NewsImportanceModel = NewsImportanceModel



tokenizer = AutoTokenizer.from_pretrained(local_dir)

model = torch.load(f"{local_dir}/model.pt", map_location="cpu", weights_only=False)

model.eval()



text = "美国现货以太坊 ETF 获批"

inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=256)



with torch.no_grad():

    logits, score = model(

        input_ids=inputs["input_ids"],

        attention_mask=inputs["attention_mask"],

        token_type_ids=inputs.get("token_type_ids"),

    )

    probs = torch.softmax(logits, dim=-1)[0]

    labels = ["noise", "low", "medium", "high"]

    importance_bin = labels[probs.argmax().item()]

    importance_score = score.item() * 100



print(importance_bin)

print(round(importance_score, 1))

```

### 方式二:仅使用 HuggingFace 分类头 | Option 2: use the classification head only

这种方式兼容 `pipeline("text-classification")`,但只能直接输出 4 档分类,不包含连续分数。

```python

from transformers import AutoModelForSequenceClassification, AutoTokenizer, pipeline



repo_id = "LocalOptimum/chinese-crypto-importance"

tokenizer = AutoTokenizer.from_pretrained(repo_id)

model = AutoModelForSequenceClassification.from_pretrained(repo_id)



pipe = pipeline("text-classification", model=model, tokenizer=tokenizer)

print(pipe("比特币突破关键阻力位并创下阶段新高"))

```

## 训练配置 | Training Configuration

- 基础模型 | Base Model: `LocalOptimum/chinese-crypto-sentiment`
- 模型结构 | Architecture: BERT backbone + classification head + regression head
- 最大长度 | Max Length: 256
- 训练轮数 | Epochs: 10(Early Stopping patience=3,最佳 epoch=4)
- 批次大小 | Batch Size: 16
- 学习率 | Learning Rate: 2e-5
- LoRA: `r=16`, `alpha=32`, `dropout=0.05`
- 损失函数 | Loss: `0.6 * cross_entropy + 0.4 * mse`
- 混合精度 | Mixed Precision: FP16

## 适用场景 | Use Cases

- 加密货币新闻优先级排序
- 实时快讯过滤与告警降噪
- 研究员 / 交易员新闻流预筛选
- 回测与研究中的事件权重特征构建
- 市场重大事件回溯分析

## 核心标注原则 | Annotation Principles

- 重要性不等于情绪:利好和利空都可能是高重要性
- 优先看市场反应,再结合新颖度、内容质量和来源可信度
- 重复快讯、摘要汇总、弱相关宏观噪声会被系统性降分
- 官方公告、重大安全事件、ETF / 监管突破通常更高分
- 主观观点和常规运营更新通常落在 `low``noise`

## 局限性 | Limitations

- 数据分布明显偏向 `low`,当前版本对高重要性事件仍偏保守
- `high` 样本较少,模型对极端高分事件的区分能力仍有提升空间
- 主要适用于中文加密货币新闻,跨领域泛化能力有限
- HuggingFace 原生 `pipeline` 只暴露分类头;连续分数需要加载 `model.pt`
- 标签来自自动评分管线与规则修正,不等同于大规模人工金融标注

## 许可证 | License

Apache-2.0

## 引用 | Citation

如果你在研究或产品中使用本模型,可以引用:

```bibtex

@misc{onefly_crypto_importance_2026,

  title={Chinese Crypto News Importance Scoring Model},

  author={Onefly},

  year={2026},

  howpublished={\url{https://huggingface.co/LocalOptimum/chinese-crypto-importance}},

  note={LoRA fine-tuned from LocalOptimum/chinese-crypto-sentiment, 20286 samples, MAE=6.87, BinAcc=61.8%}

}

```

## 基础模型 | Base Model

本模型基于以下模型继续训练:

- [LocalOptimum/chinese-crypto-sentiment](https://huggingface.co/LocalOptimum/chinese-crypto-sentiment)

## 更新日志 | Changelog

### 当前公开版本 | Current Public Version

- 首个公开的重要性评分模型版本
- 支持双头输出:连续重要性分数 + 4 档重要性分类
- 基于 20286 条中文加密货币新闻样本完成训练
- 当前验证指标:MAE=6.87,Bin Accuracy=61.8%,Pearson r=0.532

如有问题或建议,欢迎提 issue 或 PR。