LocalOptimum commited on
Commit
82d05ef
·
verified ·
1 Parent(s): 71f5c62

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +193 -193
README.md CHANGED
@@ -1,193 +1,193 @@
1
- ---
2
- language: zh
3
- license: apache-2.0
4
- tags:
5
- - sentiment-analysis
6
- - chinese
7
- - finance
8
- - finbert
9
- - crypto
10
- - text-classification
11
- datasets:
12
- - custom
13
- metrics:
14
- - accuracy
15
- - f1
16
- - precision
17
- - recall
18
- model-index:
19
- - name: Chinese Financial Sentiment Analysis (Crypto)
20
- results:
21
- - task:
22
- type: text-classification
23
- name: Sentiment Analysis
24
- metrics:
25
- - type: accuracy
26
- value: 0.645
27
- name: Accuracy
28
- - type: f1
29
- value: 0.6365
30
- name: F1 Score
31
- - type: precision
32
- value: 0.6394
33
- name: Precision
34
- - type: recall
35
- value: 0.645
36
- name: Recall
37
- ---
38
-
39
- # Chinese Financial Sentiment Analysis Model (Crypto Focus)
40
-
41
- 中文金融情感分析模型(加密货币领域)
42
-
43
- ## 模型描述 | Model Description
44
-
45
- 本模型基于 `yiyanghkust/finbert-tone-chinese` 微调,专门用于分析中文加密货币相关新闻和社交媒体内容的情感倾向。模型可以识别三种情感类别:正面(Positive)、中性(Neutral)和负面(Negative)。
46
-
47
- This model is fine-tuned from `yiyanghkust/finbert-tone-chinese` and specifically designed for sentiment analysis of Chinese cryptocurrency-related news and social media content. It can classify text into three sentiment categories: Positive, Neutral, and Negative.
48
-
49
- ## 训练数据 | Training Data
50
-
51
- - **数据量 | Size**: 1000条人工标注的中文金融新闻 | 1000 manually annotated Chinese financial news articles
52
- - **数据来源 | Source**: 加密货币相关新闻和推文 | Cryptocurrency-related news and tweets
53
- - **标注方式 | Annotation**: AI辅助 + 人工修正 | AI-assisted + Manual correction
54
- - **数据分布 | Distribution**:
55
- - Positive(正面): 420条 (42.0%)
56
- - Neutral(中性): 420条 (42.0%)
57
- - Negative(负面): 160条 (16.0%)
58
-
59
- ## 性能指标 | Performance Metrics
60
-
61
- 在200条测试集上的表现 | Performance on 200 test samples:
62
-
63
- | 指标 Metric | 数值 Value |
64
- |-------------|-----------|
65
- | 准确率 Accuracy | 64.50% |
66
- | F1分数 F1 Score | 63.65% |
67
- | 精确率 Precision | 63.94% |
68
- | 召回率 Recall | 64.50% |
69
-
70
- ## 使用方法 | Usage
71
-
72
- ### 快速开始 | Quick Start
73
-
74
- ```python
75
- from transformers import AutoTokenizer, AutoModelForSequenceClassification
76
- import torch
77
-
78
- # 加载模型和分词器 | Load model and tokenizer
79
- model_name = "YOUR_USERNAME/sentiment-finetuned-1000" # 替换为你的用户名
80
- tokenizer = AutoTokenizer.from_pretrained(model_name)
81
- model = AutoModelForSequenceClassification.from_pretrained(model_name)
82
-
83
- # 分析文本 | Analyze text
84
- text = "比特币突破10万美元创历史新高"
85
- inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128)
86
-
87
- # 预测 | Predict
88
- with torch.no_grad():
89
- outputs = model(**inputs)
90
- predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
91
- predicted_class = torch.argmax(predictions, dim=-1).item()
92
-
93
- # 结果映射 | Result mapping
94
- labels = ['positive', 'neutral', 'negative']
95
- sentiment = labels[predicted_class]
96
- confidence = predictions[0][predicted_class].item()
97
-
98
- print(f"情感: {sentiment}")
99
- print(f"置信度: {confidence:.4f}")
100
- ```
101
-
102
- ### 批量处理 | Batch Processing
103
-
104
- ```python
105
- texts = [
106
- "币安获得阿布扎比监管授权",
107
- "以太坊完成Fusaka升级",
108
- "某交易所遭攻击损失100万美元"
109
- ]
110
-
111
- inputs = tokenizer(texts, return_tensors="pt", truncation=True,
112
- max_length=128, padding=True)
113
-
114
- with torch.no_grad():
115
- outputs = model(**inputs)
116
- predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
117
- predicted_classes = torch.argmax(predictions, dim=-1)
118
-
119
- labels = ['positive', 'neutral', 'negative']
120
- for text, pred in zip(texts, predicted_classes):
121
- print(f"{text} -> {labels[pred]}")
122
- ```
123
-
124
- ## 训练参数 | Training Configuration
125
-
126
- - **基础模型 | Base Model**: yiyanghkust/finbert-tone-chinese
127
- - **训练轮数 | Epochs**: 5
128
- - **批次大小 | Batch Size**: 16
129
- - **学习率 | Learning Rate**: 2e-5
130
- - **最大序列长度 | Max Length**: 128
131
- - **训练设备 | Device**: NVIDIA GeForce RTX 3060 Laptop GPU
132
- - **训练时间 | Training Time**: ~5分钟 | ~5 minutes
133
-
134
- ## 适用场景 | Use Cases
135
-
136
- - ✅ 加密货币新闻情感分析
137
- - ✅ 社交媒体舆情监控
138
- - ✅ 金融市场情绪指标
139
- - ✅ 实时新闻情感跟踪
140
- - ✅ 投资决策辅助参考
141
-
142
- ## 局限性 | Limitations
143
-
144
- - ⚠️ 主要针对加密货币领域的金融新闻,其他金融领域可能表现不佳
145
- - ⚠️ 负面样本相对较少(16%),对负面情感的识别可能不够敏感
146
- - ⚠️ 短文本(少于10字)的分析准确率可能下降
147
- - ⚠️ 仅支持简体中文
148
- - ⚠️ 模型不能替代人工判断,仅供参考
149
-
150
- ## 许可证 | License
151
-
152
- Apache-2.0
153
-
154
- ## 引用 | Citation
155
-
156
- 如果使用本模型,请引用:
157
-
158
- ```bibtex
159
- @misc{watchtower-sentiment-2025,
160
- title={Chinese Financial Sentiment Analysis Model (Crypto Focus)},
161
- author={WatchTower Team},
162
- year={2025},
163
- howpublished={\url{https://huggingface.co/YOUR_USERNAME/sentiment-finetuned-1000}},
164
- note={Fine-tuned from yiyanghkust/finbert-tone-chinese}
165
- }
166
- ```
167
-
168
- ## 基础模型 | Base Model
169
-
170
- 本模型基于以下模型微调:
171
- - [yiyanghkust/finbert-tone-chinese](https://huggingface.co/yiyanghkust/finbert-tone-chinese)
172
-
173
- 感谢原作者的贡献!
174
-
175
- ## 更新日志 | Changelog
176
-
177
- ### v2.0 (2025-12-09)
178
- - ✅ 扩充训练数据至1000条
179
- - ✅ 修正标注错误,提升数据质量
180
- - ✅ 优化类别分布,提升模型平衡性
181
- - ✅ F1分数提升2.01%(0.6165 → 0.6365)
182
-
183
- ### v1.0 (Initial Release)
184
- - 基于500条标注数据的初始版本
185
-
186
- ## 联系方式 | Contact
187
-
188
- 如有问题或建议,欢迎提 issue 或 PR。
189
-
190
- ---
191
-
192
- **维护者 | Maintainer**: WatchTower Team
193
- **最后更新 | Last Updated**: 2025-12-09
 
1
+ ---
2
+ language: zh
3
+ license: apache-2.0
4
+ tags:
5
+ - sentiment-analysis
6
+ - chinese
7
+ - finance
8
+ - finbert
9
+ - crypto
10
+ - text-classification
11
+ datasets:
12
+ - custom
13
+ metrics:
14
+ - accuracy
15
+ - f1
16
+ - precision
17
+ - recall
18
+ model-index:
19
+ - name: Chinese Financial Sentiment Analysis (Crypto)
20
+ results:
21
+ - task:
22
+ type: text-classification
23
+ name: Sentiment Analysis
24
+ metrics:
25
+ - type: accuracy
26
+ value: 0.645
27
+ name: Accuracy
28
+ - type: f1
29
+ value: 0.6365
30
+ name: F1 Score
31
+ - type: precision
32
+ value: 0.6394
33
+ name: Precision
34
+ - type: recall
35
+ value: 0.645
36
+ name: Recall
37
+ ---
38
+
39
+ # Chinese Financial Sentiment Analysis Model (Crypto Focus)
40
+
41
+ 中文金融情感分析模型(加密货币领域)
42
+
43
+ ## 模型描述 | Model Description
44
+
45
+ 本模型基于 `yiyanghkust/finbert-tone-chinese` 微调,专门用于分析中文加密货币相关新闻和社交媒体内容的情感倾向。模型可以识别三种情感类别:正面(Positive)、中性(Neutral)和负面(Negative)。
46
+
47
+ This model is fine-tuned from `yiyanghkust/finbert-tone-chinese` and specifically designed for sentiment analysis of Chinese cryptocurrency-related news and social media content. It can classify text into three sentiment categories: Positive, Neutral, and Negative.
48
+
49
+ ## 训练数据 | Training Data
50
+
51
+ - **数据量 | Size**: 1000条人工标注的中文金融新闻 | 1000 manually annotated Chinese financial news articles
52
+ - **数据来源 | Source**: 加密货币相关新闻和推文 | Cryptocurrency-related news and tweets
53
+ - **标注方式 | Annotation**: AI辅助 + 人工修正 | AI-assisted + Manual correction
54
+ - **数据分布 | Distribution**:
55
+ - Positive(正面): 420条 (42.0%)
56
+ - Neutral(中性): 420条 (42.0%)
57
+ - Negative(负面): 160条 (16.0%)
58
+
59
+ ## 性能指标 | Performance Metrics
60
+
61
+ 在200条测试集上的表现 | Performance on 200 test samples:
62
+
63
+ | 指标 Metric | 数值 Value |
64
+ |-------------|-----------|
65
+ | 准确率 Accuracy | 64.50% |
66
+ | F1分数 F1 Score | 63.65% |
67
+ | 精确率 Precision | 63.94% |
68
+ | 召回率 Recall | 64.50% |
69
+
70
+ ## 使用方法 | Usage
71
+
72
+ ### 快速开始 | Quick Start
73
+
74
+ ```python
75
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
76
+ import torch
77
+
78
+ # 加载模型和分词器 | Load model and tokenizer
79
+ model_name = "LocalOptimum/chinese-crypto-sentiment"
80
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
81
+ model = AutoModelForSequenceClassification.from_pretrained(model_name)
82
+
83
+ # 分析文本 | Analyze text
84
+ text = "比特币突破10万美元创历史新高"
85
+ inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128)
86
+
87
+ # 预测 | Predict
88
+ with torch.no_grad():
89
+ outputs = model(**inputs)
90
+ predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
91
+ predicted_class = torch.argmax(predictions, dim=-1).item()
92
+
93
+ # 结果映射 | Result mapping
94
+ labels = ['positive', 'neutral', 'negative']
95
+ sentiment = labels[predicted_class]
96
+ confidence = predictions[0][predicted_class].item()
97
+
98
+ print(f"情感: {sentiment}")
99
+ print(f"置信度: {confidence:.4f}")
100
+ ```
101
+
102
+ ### 批量处理 | Batch Processing
103
+
104
+ ```python
105
+ texts = [
106
+ "币安获得阿布扎比监管授权",
107
+ "以太坊完成Fusaka升级",
108
+ "某交易所遭攻击损失100万美元"
109
+ ]
110
+
111
+ inputs = tokenizer(texts, return_tensors="pt", truncation=True,
112
+ max_length=128, padding=True)
113
+
114
+ with torch.no_grad():
115
+ outputs = model(**inputs)
116
+ predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
117
+ predicted_classes = torch.argmax(predictions, dim=-1)
118
+
119
+ labels = ['positive', 'neutral', 'negative']
120
+ for text, pred in zip(texts, predicted_classes):
121
+ print(f"{text} -> {labels[pred]}")
122
+ ```
123
+
124
+ ## 训练参数 | Training Configuration
125
+
126
+ - **基础模型 | Base Model**: yiyanghkust/finbert-tone-chinese
127
+ - **训练轮数 | Epochs**: 5
128
+ - **批次大小 | Batch Size**: 16
129
+ - **学习率 | Learning Rate**: 2e-5
130
+ - **最大序列长度 | Max Length**: 128
131
+ - **训练设备 | Device**: NVIDIA GeForce RTX 3060 Laptop GPU
132
+ - **训练时间 | Training Time**: ~5分钟 | ~5 minutes
133
+
134
+ ## 适用场景 | Use Cases
135
+
136
+ - ✅ 加密货币新闻情感分析
137
+ - ✅ 社交媒体舆情监控
138
+ - ✅ 金融市场情绪指标
139
+ - ✅ 实时新闻情感跟踪
140
+ - ✅ 投资决策辅助参考
141
+
142
+ ## 局限性 | Limitations
143
+
144
+ - ⚠️ 主要针对加密货币领域的金融新闻,其他金融领域可能表现不佳
145
+ - ⚠️ 负面样本相对较少(16%),对负面情感的识别可能不够敏感
146
+ - ⚠️ 短文本(少于10字)的分析准确率可能下降
147
+ - ⚠️ 仅支持简体中文
148
+ - ⚠️ 模型不能替代人工判断,仅供参考
149
+
150
+ ## 许可证 | License
151
+
152
+ Apache-2.0
153
+
154
+ ## 引用 | Citation
155
+
156
+ 如果使用本模型,请引用:
157
+
158
+ ```bibtex
159
+ @misc{watchtower-sentiment-2025,
160
+ title={Chinese Financial Sentiment Analysis Model (Crypto Focus)},
161
+ author={Onefly},
162
+ year={2025},
163
+ howpublished={\url{https://huggingface.co/YOUR_USERNAME/sentiment-finetuned-1000}},
164
+ note={Fine-tuned from yiyanghkust/finbert-tone-chinese}
165
+ }
166
+ ```
167
+
168
+ ## 基础模型 | Base Model
169
+
170
+ 本模型基于以下模型微调:
171
+ - [yiyanghkust/finbert-tone-chinese](https://huggingface.co/yiyanghkust/finbert-tone-chinese)
172
+
173
+ 感谢原作者的贡献!
174
+
175
+ ## 更新日志 | Changelog
176
+
177
+ ### v2.0 (2025-12-09)
178
+ - ✅ 扩充训练数据至1000条
179
+ - ✅ 修正标注错误,提升数据质量
180
+ - ✅ 优化类别分布,提升模型平衡性
181
+ - ✅ F1分数提升2.01%(0.6165 → 0.6365)
182
+
183
+ ### v1.0 (Initial Release)
184
+ - 基于500条标注数据的初始版本
185
+
186
+ ## 联系方式 | Contact
187
+
188
+ 如有问题或建议,欢迎提 issue 或 PR。
189
+
190
+ ---
191
+
192
+ **维护者 | Maintainer**: Onefly
193
+ **最后更新 | Last Updated**: 2025-12-09