|
|
--- |
|
|
language: zh |
|
|
library_name: transformers |
|
|
pipeline_tag: text-classification |
|
|
tags: |
|
|
- bert |
|
|
- chinese |
|
|
- stance-analysis |
|
|
- text-classification |
|
|
- pytorch |
|
|
- safetensors |
|
|
datasets: |
|
|
- custom-stance-dataset |
|
|
metrics: |
|
|
- accuracy |
|
|
- f1 |
|
|
model-index: |
|
|
- name: stance-ch |
|
|
results: |
|
|
- task: |
|
|
type: text-classification |
|
|
name: Stance Analysis |
|
|
dataset: |
|
|
type: custom |
|
|
name: Chinese Stance Dataset |
|
|
metrics: |
|
|
- type: accuracy |
|
|
value: N/A |
|
|
name: Test Accuracy |
|
|
- type: accuracy |
|
|
value: N/A |
|
|
name: Validation Accuracy |
|
|
--- |
|
|
|
|
|
# Chinese BERT for Stance Analysis (立场分析) |
|
|
|
|
|
这是一个基于BERT的中文立场分析模型,能够识别文本中表达的支持(SUPPORTIVE)或反对(OPPOSING)立场。 |
|
|
|
|
|
## 模型信息 |
|
|
|
|
|
- **模型基础**: bert-base-chinese |
|
|
- **任务类型**: 二分类文本分类 |
|
|
- **语言**: 中文 (Chinese) |
|
|
- **训练数据**: 包含6,668条立场标注数据 |
|
|
- **标签**: OPPOSING (反对), SUPPORTIVE (支持) |
|
|
|
|
|
## 性能指标 |
|
|
|
|
|
| 指标 | 数值 | |
|
|
|------|------| |
|
|
| 验证集准确率 | N/A | |
|
|
| 测试集准确率 | N/A | |
|
|
| 训练轮数 | N/A | |
|
|
|
|
|
## 使用方法 |
|
|
|
|
|
```python |
|
|
from transformers import BertTokenizer, BertForSequenceClassification |
|
|
import torch |
|
|
|
|
|
# 加载模型和tokenizer |
|
|
model_name = "FutureMa/stance_ch" |
|
|
tokenizer = BertTokenizer.from_pretrained(model_name) |
|
|
model = BertForSequenceClassification.from_pretrained(model_name) |
|
|
|
|
|
# 示例预测 |
|
|
def predict_stance(text): |
|
|
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=512) |
|
|
|
|
|
with torch.no_grad(): |
|
|
outputs = model(**inputs) |
|
|
predictions = torch.nn.functional.softmax(outputs.logits, dim=-1) |
|
|
predicted_class = torch.argmax(predictions, dim=-1).item() |
|
|
|
|
|
labels = {"0": "OPPOSING", "1": "SUPPORTIVE"} |
|
|
confidence = predictions[0][predicted_class].item() |
|
|
|
|
|
return { |
|
|
"stance": labels[str(predicted_class)], |
|
|
"confidence": confidence |
|
|
} |
|
|
|
|
|
# 使用示例 |
|
|
text = "我完全支持这个政策,它对社会发展有积极作用。" |
|
|
result = predict_stance(text) |
|
|
print(f"立场: {result['stance']}, 置信度: {result['confidence']:.4f}") |
|
|
``` |
|
|
|
|
|
## 数据格式 |
|
|
|
|
|
训练数据采用以下格式: |
|
|
``` |
|
|
topic: [话题内容] |
|
|
answer: [回答内容] |
|
|
``` |
|
|
|
|
|
模型基于回答内容判断其对话题的立场。 |
|
|
|
|
|
## 标签说明 |
|
|
|
|
|
- `OPPOSING` (0): 反对立场 |
|
|
- `SUPPORTIVE` (1): 支持立场 |
|
|
|
|
|
## 训练详情 |
|
|
|
|
|
- **优化器**: AdamW |
|
|
- **学习率**: 2e-5 |
|
|
- **批次大小**: 16 |
|
|
- **最大序列长度**: 512 |
|
|
- **早停策略**: 验证集准确率连续3轮无提升 |
|
|
- **数据划分**: 训练集 6,268 / 验证集 200 / 测试集 200 |
|
|
|
|
|
## 注意事项 |
|
|
|
|
|
1. 模型主要针对中文文本训练 |
|
|
2. 最佳输入长度为512个token以内 |
|
|
3. 模型对政治、社会话题的立场分析效果较好 |
|
|
4. 建议在使用前对特定领域数据进行微调 |
|
|
|
|
|
## 许可证 |
|
|
|
|
|
本模型遵循Apache-2.0许可证。 |
|
|
|
|
|
## 引用 |
|
|
|
|
|
如果使用本模型,请引用: |
|
|
```bibtex |
|
|
@misc{stance_ch_2025, |
|
|
title={Chinese BERT for Stance Analysis}, |
|
|
author={FutureMa}, |
|
|
year={2025}, |
|
|
publisher={Hugging Face}, |
|
|
url={https://huggingface.co/FutureMa/stance_ch} |
|
|
} |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
*模型训练时间: 2025年09月07日* |
|
|
|