Upload stance analysis model - 2025-09-07 23:28:46

Browse files

Files changed (6) hide show

README.md +139 -3
config.json +39 -0
model.safetensors +3 -0
special_tokens_map.json +7 -0
tokenizer_config.json +58 -0
vocab.txt +0 -0

README.md CHANGED Viewed

@@ -1,3 +1,139 @@
----
-license: apache-2.0
----

+---
+language: zh
+library_name: transformers
+pipeline_tag: text-classification
+tags:
+- bert
+- chinese
+- stance-analysis
+- text-classification
+- pytorch
+- safetensors
+datasets:
+- custom-stance-dataset
+metrics:
+- accuracy
+- f1
+model-index:
+- name: stance-ch
+  results:
+  - task:
+      type: text-classification
+      name: Stance Analysis
+    dataset:
+      type: custom
+      name: Chinese Stance Dataset
+    metrics:
+    - type: accuracy
+      value: N/A
+      name: Test Accuracy
+    - type: accuracy
+      value: N/A
+      name: Validation Accuracy
+---
+# Chinese BERT for Stance Analysis (立场分析)
+这是一个基于BERT的中文立场分析模型，能够识别文本中表达的支持(SUPPORTIVE)或反对(OPPOSING)立场。
+## 模型信息
+- **模型基础**: bert-base-chinese
+- **任务类型**: 二分类文本分类
+- **语言**: 中文 (Chinese)
+- **训练数据**: 包含6,668条立场标注数据
+- **标签**: OPPOSING (反对), SUPPORTIVE (支持)
+## 性能指标
+| 指标 | 数值 |
+|------|------|
+| 验证集准确率 | N/A |
+| 测试集准确率 | N/A |
+| 训练轮数 | N/A |
+## 使用方法
+```python
+from transformers import BertTokenizer, BertForSequenceClassification
+import torch
+# 加载模型和tokenizer
+model_name = "FutureMa/stance_ch"
+tokenizer = BertTokenizer.from_pretrained(model_name)
+model = BertForSequenceClassification.from_pretrained(model_name)
+# 示例预测
+def predict_stance(text):
+    inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=512)
+    with torch.no_grad():
+        outputs = model(**inputs)
+        predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
+        predicted_class = torch.argmax(predictions, dim=-1).item()
+    labels = {"0": "OPPOSING", "1": "SUPPORTIVE"}
+    confidence = predictions[0][predicted_class].item()
+    return {
+        "stance": labels[str(predicted_class)],
+        "confidence": confidence
+    }
+# 使用示例
+text = "我完全支持这个政策，它对社会发展有积极作用。"
+result = predict_stance(text)
+print(f"立场: {result['stance']}, 置信度: {result['confidence']:.4f}")
+```
+## 数据格式
+训练数据采用以下格式:
+```
+topic: [话题内容]
+answer: [回答内容]
+```
+模型基于回答内容判断其对话题的立场。
+## 标签说明
+- `OPPOSING` (0): 反对立场
+- `SUPPORTIVE` (1): 支持立场
+## 训练详情
+- **优化器**: AdamW
+- **学习率**: 2e-5
+- **批次大小**: 16
+- **最大序列长度**: 512
+- **早停策略**: 验证集准确率连续3轮无提升
+- **数据划分**: 训练集 6,268 / 验证集 200 / 测试集 200
+## 注意事项
+1. 模型主要针对中文文本训练
+2. 最佳输入长度为512个token以内
+3. 模型对政治、社会话题的立场分析效果较好
+4. 建议在使用前对特定领域数据进行微调
+## 许可证
+本模型遵循Apache-2.0许可证。
+## 引用
+如果使用本模型，请引用:
+```bibtex
+@misc{stance_ch_2025,
+  title={Chinese BERT for Stance Analysis},
+  author={FutureMa},
+  year={2025},
+  publisher={Hugging Face},
+  url={https://huggingface.co/FutureMa/stance_ch}
+}
+```
+---
+*模型训练时间: 2025年09月07日*

config.json ADDED Viewed

	@@ -0,0 +1,39 @@

+{
+  "architectures": [
+    "BertForMaskedLM"
+  ],
+  "attention_probs_dropout_prob": 0.1,
+  "classifier_dropout": null,
+  "directionality": "bidi",
+  "hidden_act": "gelu",
+  "hidden_dropout_prob": 0.1,
+  "hidden_size": 768,
+  "id2label": {
+    "0": "OPPOSING",
+    "1": "SUPPORTIVE"
+  },
+  "initializer_range": 0.02,
+  "intermediate_size": 3072,
+  "label2id": {
+    "OPPOSING": 0,
+    "SUPPORTIVE": 1
+  },
+  "layer_norm_eps": 1e-12,
+  "max_position_embeddings": 512,
+  "model_type": "bert",
+  "num_attention_heads": 12,
+  "num_hidden_layers": 12,
+  "pad_token_id": 0,
+  "pooler_fc_size": 768,
+  "pooler_num_attention_heads": 12,
+  "pooler_num_fc_layers": 3,
+  "pooler_size_per_head": 128,
+  "pooler_type": "first_token_transform",
+  "position_embedding_type": "absolute",
+  "problem_type": "single_label_classification",
+  "torch_dtype": "float32",
+  "transformers_version": "4.52.4",
+  "type_vocab_size": 2,
+  "use_cache": true,
+  "vocab_size": 21128
+}

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3b87945a5870efc3849933dbb40cf3d1792915436469f8c4e0aa9579d9798ead
+size 409100208

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,7 @@

+{
+  "cls_token": "[CLS]",
+  "mask_token": "[MASK]",
+  "pad_token": "[PAD]",
+  "sep_token": "[SEP]",
+  "unk_token": "[UNK]"
+}

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,58 @@

+{
+  "added_tokens_decoder": {
+    "0": {
+      "content": "[PAD]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100": {
+      "content": "[UNK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "101": {
+      "content": "[CLS]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "102": {
+      "content": "[SEP]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "103": {
+      "content": "[MASK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "clean_up_tokenization_spaces": true,
+  "cls_token": "[CLS]",
+  "do_basic_tokenize": true,
+  "do_lower_case": false,
+  "extra_special_tokens": {},
+  "mask_token": "[MASK]",
+  "model_max_length": 512,
+  "never_split": null,
+  "pad_token": "[PAD]",
+  "sep_token": "[SEP]",
+  "strip_accents": null,
+  "tokenize_chinese_chars": true,
+  "tokenizer_class": "BertTokenizer",
+  "unk_token": "[UNK]"
+}

vocab.txt ADDED Viewed

The diff for this file is too large to render. See raw diff