| ---
|
| language:
|
| - zh
|
| license: apache-2.0
|
| tags:
|
| - finance
|
| - cryptocurrency
|
| - chinese
|
| - news-scoring
|
| - text-classification
|
| - text-regression
|
| pipeline_tag: text-classification
|
| library_name: transformers
|
| base_model: LocalOptimum/chinese-crypto-sentiment
|
| metrics:
|
| - mae
|
| - accuracy
|
| - pearsonr
|
| model-index:
|
| - name: chinese-crypto-importance (v1.1)
|
| results:
|
| - task:
|
| type: text-classification
|
| name: News Importance Binning
|
| metrics:
|
| - type: mae
|
| value: 6.87
|
| name: MAE
|
| - type: accuracy
|
| value: 61.8%
|
| name: Bin Accuracy
|
| - type: pearsonr
|
| value: 0.532
|
| name: Pearson r
|
| ---
|
|
|
| # Chinese Crypto News Importance Scoring Model | 中文加密货币新闻重要性评分模型 (v1.1)
|
|
|
| ## 模型描述 | Model Description
|
|
|
| 本模型基于 [LocalOptimum/chinese-crypto-sentiment](https://huggingface.co/LocalOptimum/chinese-crypto-sentiment) 进行 LoRA 微调,专门用于评估中文加密货币新闻的“市场重要性”,而不是传统的情感极性。
|
|
|
| 模型采用双头结构,同时输出:
|
|
|
| - `importance_score`:0-100 连续分数,用于衡量新闻对市场的潜在影响
|
| - `importance_bin`:4 档区间分类,分别为 `noise` / `low` / `medium` / `high`
|
|
|
| 它要回答的问题是:这条新闻是否值得交易员、研究员或自动化新闻流优先关注,而不只是判断文本是利好还是利空。
|
|
|
| This model is LoRA fine-tuned from [LocalOptimum/chinese-crypto-sentiment](https://huggingface.co/LocalOptimum/chinese-crypto-sentiment) for Chinese cryptocurrency news importance scoring rather than plain sentiment classification. It outputs both a continuous score and a 4-way importance bin for ranking and filtering workflows.
|
|
|
| ## 训练数据 | Training Data
|
|
|
| - 数据量 | Size: 20286 条中文加密货币新闻样本 | 20286 Chinese crypto news samples
|
| - 数据来源 | Source: EventAlpha / WatchTower 采集的 19729 条新闻 + 557 条推文 | 19729 news articles + 557 tweets collected via EventAlpha / WatchTower
|
| - 标注方式 | Labeling: 自动四维评分管线 + 规则修正 | 4-axis automatic scoring pipeline with rule-based cleanup
|
| - 划分方式 | Split: 随机划分,训练集 17243 / 验证集 3043 | Random split with 17243 train and 3043 validation samples
|
| - 平均分数 | Average Score: 41.7
|
|
|
| ### 标注维度 | Scoring Axes
|
|
|
| | Axis | Range | Description |
|
| |---|---:|---|
|
| | Market Reaction | 0-40 | Post-news price move, volume expansion, and volatility reaction |
|
| | Novelty | 0-30 | Whether the item is first-hand, repeated, or part of a digest |
|
| | Content Quality | 0-20 | Information density, numeric detail, token relevance, and noise penalties |
|
| | Source Authority | 0-10 | Credibility of the outlet, platform, and whether it is official |
|
|
|
| ### 数据分布 | Label Distribution
|
|
|
| | Bin | Score Range | Count | Share | 含义 / Interpretation |
|
| |---|---:|---:|---:|---|
|
| | `noise` | 0-25 | 1626 | 8.0% | Low-signal, duplicate, digest, or weakly relevant content |
|
| | `low` | 25-50 | 14773 | 72.8% | Routine updates that rarely move the market on their own |
|
| | `medium` | 50-75 | 3840 | 18.9% | Tradeable developments with meaningful but limited impact |
|
| | `high` | 75-100 | 47 | 0.2% | Major events that may materially change price or risk appetite |
|
|
|
| ## 性能指标 | Performance Metrics
|
|
|
| 当前公开版本在验证集上的表现如下:
|
|
|
| | 指标 Metric | 数值 Value |
|
| |---|---:|
|
| | MAE | 6.87 |
|
| | Bin Accuracy | 61.8% |
|
| | Pearson r | 0.532 |
|
| | Best Epoch | 4 |
|
|
|
| ## 分数解释 | Score Interpretation
|
|
|
| | Bin | Score Range | 典型含义 |
|
| |---|---:|---|
|
| | `noise` | 0-25 | 摘要类、弱相关信息、重复快讯、低信号内容 |
|
| | `low` | 25-50 | 常规更新、普通运营动作、主观评论、有限催化 |
|
| | `medium` | 50-75 | 有交易意义的重要进展,但未必足以改变大趋势 |
|
| | `high` | 75-100 | 黑客攻击、ETF 获批、重大监管变化、系统性风险事件 |
|
|
|
| ## 使用方法 | Usage
|
|
|
| ### 方式一:加载完整双头模型(推荐) | Option 1: load the full dual-head model
|
|
|
| 这种方式可以同时得到 `importance_score` 和 `importance_bin`。
|
|
|
| ```python
|
| import __main__
|
| import sys
|
| import torch
|
| from huggingface_hub import snapshot_download
|
| from transformers import AutoTokenizer
|
|
|
| repo_id = "LocalOptimum/chinese-crypto-importance"
|
| local_dir = snapshot_download(repo_id)
|
| sys.path.insert(0, local_dir)
|
|
|
| from model import NewsImportanceModel
|
|
|
| __main__.NewsImportanceModel = NewsImportanceModel
|
|
|
| tokenizer = AutoTokenizer.from_pretrained(local_dir)
|
| model = torch.load(f"{local_dir}/model.pt", map_location="cpu", weights_only=False)
|
| model.eval()
|
|
|
| text = "美国现货以太坊 ETF 获批"
|
| inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=256)
|
|
|
| with torch.no_grad():
|
| logits, score = model(
|
| input_ids=inputs["input_ids"],
|
| attention_mask=inputs["attention_mask"],
|
| token_type_ids=inputs.get("token_type_ids"),
|
| )
|
| probs = torch.softmax(logits, dim=-1)[0]
|
| labels = ["noise", "low", "medium", "high"]
|
| importance_bin = labels[probs.argmax().item()]
|
| importance_score = score.item() * 100
|
|
|
| print(importance_bin)
|
| print(round(importance_score, 1))
|
| ```
|
|
|
| ### 方式二:仅使用 HuggingFace 分类头 | Option 2: use the classification head only
|
|
|
| 这种方式兼容 `pipeline("text-classification")`,但只能直接输出 4 档分类,不包含连续分数。
|
|
|
| ```python
|
| from transformers import AutoModelForSequenceClassification, AutoTokenizer, pipeline
|
|
|
| repo_id = "LocalOptimum/chinese-crypto-importance"
|
| tokenizer = AutoTokenizer.from_pretrained(repo_id)
|
| model = AutoModelForSequenceClassification.from_pretrained(repo_id)
|
|
|
| pipe = pipeline("text-classification", model=model, tokenizer=tokenizer)
|
| print(pipe("比特币突破关键阻力位并创下阶段新高"))
|
| ```
|
|
|
| ## 训练配置 | Training Configuration
|
|
|
| - 基础模型 | Base Model: `LocalOptimum/chinese-crypto-sentiment`
|
| - 模型结构 | Architecture: BERT backbone + classification head + regression head
|
| - 最大长度 | Max Length: 256
|
| - 训练轮数 | Epochs: 10(Early Stopping patience=3,最佳 epoch=4)
|
| - 批次大小 | Batch Size: 16
|
| - 学习率 | Learning Rate: 2e-5
|
| - LoRA: `r=16`, `alpha=32`, `dropout=0.05`
|
| - 损失函数 | Loss: `0.6 * cross_entropy + 0.4 * mse`
|
| - 混合精度 | Mixed Precision: FP16
|
|
|
| ## 适用场景 | Use Cases
|
|
|
| - 加密货币新闻优先级排序
|
| - 实时快讯过滤与告警降噪
|
| - 研究员 / 交易员新闻流预筛选
|
| - 回测与研究中的事件权重特征构建
|
| - 市场重大事件回溯分析
|
|
|
| ## 核心标注原则 | Annotation Principles
|
|
|
| - 重要性不等于情绪:利好和利空都可能是高重要性
|
| - 优先看市场反应,再结合新颖度、内容质量和来源可信度
|
| - 重复快讯、摘要汇总、弱相关宏观噪声会被系统性降分
|
| - 官方公告、重大安全事件、ETF / 监管突破通常更高分
|
| - 主观观点和常规运营更新通常落在 `low` 或 `noise`
|
|
|
| ## 局限性 | Limitations
|
|
|
| - 数据分布明显偏向 `low`,当前版本对高重要性事件仍偏保守
|
| - `high` 样本较少,模型对极端高分事件的区分能力仍有提升空间
|
| - 主要适用于中文加密货币新闻,跨领域泛化能力有限
|
| - HuggingFace 原生 `pipeline` 只暴露分类头;连续分数需要加载 `model.pt`
|
| - 标签来自自动评分管线与规则修正,不等同于大规模人工金融标注
|
|
|
| ## 许可证 | License
|
|
|
| Apache-2.0
|
|
|
| ## 引用 | Citation
|
|
|
| 如果你在研究或产品中使用本模型,可以引用:
|
|
|
| ```bibtex
|
| @misc{onefly_crypto_importance_2026,
|
| title={Chinese Crypto News Importance Scoring Model},
|
| author={Onefly},
|
| year={2026},
|
| howpublished={\url{https://huggingface.co/LocalOptimum/chinese-crypto-importance}},
|
| note={LoRA fine-tuned from LocalOptimum/chinese-crypto-sentiment, 20286 samples, MAE=6.87, BinAcc=61.8%}
|
| }
|
| ```
|
|
|
| ## 基础模型 | Base Model
|
|
|
| 本模型基于以下模型继续训练:
|
|
|
| - [LocalOptimum/chinese-crypto-sentiment](https://huggingface.co/LocalOptimum/chinese-crypto-sentiment)
|
|
|
| ## 更新日志 | Changelog
|
|
|
| ### 当前公开版本 | Current Public Version
|
|
|
| - 首个公开的重要性评分模型版本
|
| - 支持双头输出:连续重要性分数 + 4 档重要性分类
|
| - 基于 20286 条中文加密货币新闻样本完成训练
|
| - 当前验证指标:MAE=6.87,Bin Accuracy=61.8%,Pearson r=0.532
|
|
|
| 如有问题或建议,欢迎提 issue 或 PR。
|
|
|